In this version these will be performed:
Top features selection based on trained models’ feature importance.
This will depend on different number of CpGs selected and different features selection methods.
The features selection methods mainly have two different purpose, one is for binary classification, another is multi-class classification.
Top features selection based on trained models’ feature importance with different selection methods.
There will have several selection methods, for example based on mean feature importance, median quantile feature importance and frequency / common feature importance.
Output two data frames that will be used in Pareto optimal.
One is filtered data frame with Top Number of features based on different method selection.
The another one is the phenotype data frame.
The section of evaluation for the output selected feature performance based on three methods are performed.
This part is collection of input , change them as needed.
csv_Ni1905FilePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\ADNI_covariate_withEpiage_1905obs.csv"
TopSelectedCpGs_filePath<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\DataSets\\Top5K_CpGs.csv"
# Number of Top CpGs keeped based on standard deviation
Number_N_TopNCpGs<-params$INPUT_Number_N_TopNCpGs
# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"
Impute_NA_FLAG_NUM = 1
# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
# if we want to use classification with CN vs AD, then let "METHOD_FEATURE_FLAG_NUM=4"
# if we want to use classification with CN vs MCI, then let "METHOD_FEATURE_FLAG_NUM=5"
# if we want to use classification with MCI vs AD, then let "METHOD_FEATURE_FLAG_NUM=6"
METHOD_FEATURE_FLAG_NUM = 5
# GOTO "INPUT" Session to set the Number of common features needed
# Generally this is for visualization
NUM_COMMON_FEATURES_SET = 20
NUM_COMMON_FEATURES_SET_Frequency = 20
The feature selection method :
# This is the flag of phenotype data output,
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".
phenoOutPUt_FLAG = TRUE
# For 8.0 Feature Selection and Output :
# NUM_FEATURES <- INPUT_NUMBER_FEATURES
# This is number of features needed
# Method_Selected_Choose <- INPUT_Method_Selected_Choose
# This is the method performed for the Output stage feature selection method
INPUT_NUMBER_FEATURES = params$INPUT_OUT_NUMBER_FEATURES
INPUT_Method_Mean_Choose = TRUE
INPUT_Method_Median_Choose = TRUE
INPUT_Method_Frequency_Choose = TRUE
if(INPUT_Method_Mean_Choose|| INPUT_Method_Median_Choose || INPUT_Method_Frequency_Choose){
OUTUT_file_directory<- "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method5_CN_vs_MCI\\Method5_CN_vs_MCI_SelectedFeatures\\"
OUTUT_CSV_PATHNAME <- paste(OUTUT_file_directory,"INPUT_",Number_N_TopNCpGs,"CpGs\\",sep="")
if (dir.exists(OUTUT_CSV_PATHNAME)) {
message("Directory already exists.")
} else {
dir.create(OUTUT_CSV_PATHNAME, recursive = TRUE)
message("Directory created.")
}
}
## Directory already exists.
FLAG_WRITE_METRICS_DF is flag of whether to output the csv which contains the performance metrics.
# This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics
Metrics_Table_Output_FLAG = TRUE
FLAG_WRITE_METRICS_DF = TRUE
"C:/Users/wangtia/Desktop/AD Risk/part2/VersionHistory/Version7_AutoKnit_Results/Method5_CN_vs_MCI/Method5_CN_vs_MCI_PerformanceMetrics"
## [1] "C:/Users/wangtia/Desktop/AD Risk/part2/VersionHistory/Version7_AutoKnit_Results/Method5_CN_vs_MCI/Method5_CN_vs_MCI_PerformanceMetrics"
if(FLAG_WRITE_METRICS_DF){
OUTUT_PerfMertics_directory<-"C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method5_CN_vs_MCI\\Method5_CN_vs_MCI_PerformanceMetrics\\"
OUTUT_PerformanceMetricsCSV_PATHNAME <- paste(OUTUT_PerfMertics_directory,"INPUT_",Number_N_TopNCpGs,"CpGs_",INPUT_NUMBER_FEATURES,"SelFeature_PerMetrics.csv",sep="")
if (dir.exists(OUTUT_PerfMertics_directory)) {
message("Directory already exists.")
} else {
dir.create(OUTUT_PerfMertics_directory, recursive = TRUE)
message("Directory created.")
}
print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## Directory already exists.
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method5_CN_vs_MCI\\Method5_CN_vs_MCI_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"
Packages and Libraries that may need to install and use.
# Function to check and install Bioconductor package: "limma"
install_bioc_packages <- function(packages) {
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
BiocManager::install(pkg, dependencies = TRUE)
} else {
message(paste("Package", pkg, "is already installed."))
}
}
}
install_bioc_packages("limma")
## Package limma is already installed.
print("The required packages are all successfully installed.")
## [1] "The required packages are all successfully installed."
library(limma)
Set seed for reproduction.
set.seed(123)
csv_NI1905<-read.csv(csv_Ni1905FilePath)
csv_NI1905_RAW <- csv_NI1905
TopSelectedCpGs<-read.csv(TopSelectedCpGs_filePath, check.names = FALSE)
TopSelectedCpGs_RAW <- TopSelectedCpGs
head(csv_NI1905,n=3)
rownames(csv_NI1905)<-as.matrix(csv_NI1905[,"barcodes"])
dim(csv_NI1905)
## [1] 1905 23
dim(TopSelectedCpGs)
## [1] 5000 1921
head(TopSelectedCpGs[,1:8])
rownames(TopSelectedCpGs)<-TopSelectedCpGs[,1]
head(rownames(TopSelectedCpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopSelectedCpGs))
## [1] "ProbeID" "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopSelectedCpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"
This part is used to adjust the CpGs needed to use, it will keep the top N CpGs based on standard deviation.
sorted_TopSelectedCpGs <- TopSelectedCpGs[order(-TopSelectedCpGs$sdDev), ]
TopN_CpGs <- head(sorted_TopSelectedCpGs,Number_N_TopNCpGs )
TopN_CpGs_RAW<-TopN_CpGs
Variable “TopN_CpGs” will be used for processing the data. Now let’s take a look at it.
dim(TopN_CpGs)
## [1] 5000 1921
rownames(TopN_CpGs)<-TopN_CpGs[,1]
head(rownames(TopN_CpGs))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
head(colnames(TopN_CpGs))
## [1] "ProbeID" "200223270003_R01C01" "200223270003_R02C01" "200223270003_R03C01" "200223270003_R04C01" "200223270003_R05C01"
tail(colnames(TopN_CpGs))
## [1] "201046290111_R04C01" "201046290111_R05C01" "201046290111_R06C01" "201046290111_R07C01" "201046290111_R08C01" "sdDev"
Now, let’s check with duplicate of Sample ID (“barcodes”):
Start with people who don’t have the unique ID (“uniqueID = 0”):
library(dplyr)
dim(csv_NI1905[csv_NI1905$uniqueID == 0, ])
## [1] 1256 23
dim(csv_NI1905[csv_NI1905$uniqueID == 1, ])
## [1] 649 23
duplicates <- csv_NI1905[csv_NI1905$uniqueID == 0, ] %>%
group_by(barcodes) %>%
filter(n() > 1) %>%
ungroup()
print(dim(duplicates))
## [1] 0 23
rm(duplicates)
Based on the output of dimension , they have the different Sample ID (“barcodes”).
Then check with all records, whether they have duplicated Sample ID (“barcodes”).
duplicates <- csv_NI1905 %>%
group_by(barcodes) %>%
filter(n() > 1) %>%
ungroup()
print(dim(duplicates))
## [1] 0 23
From the above output, we can see the Sample ID (“barcodes”) are unique.
names(csv_NI1905)
## [1] "barcodes" "RID.a" "prop.B" "prop.NK" "prop.CD4T" "prop.CD8T" "prop.Mono" "prop.Neutro" "prop.Eosino" "DX" "age.now" "PTGENDER" "ABETA" "TAU"
## [15] "PTAU" "PC1" "PC2" "PC3" "ageGroup" "ageGroupsq" "DX_num" "uniqueID" "Horvath"
There might have the situation that the same person with different timeline. So we only keep the data with who has the unique ID, “unique ID =1”
csv_NI1905<-csv_NI1905[csv_NI1905$uniqueID == 1, ]
dim(csv_NI1905)
## [1] 649 23
Since “DX” will be response variable, we first remove all rows with NA value in “DX” column
# "DX" will be Y,remove all rows with NA value in "DX" column
csv_NI1905<-csv_NI1905 %>% filter(!is.na(DX))
We only keep with the samples which appears in both datasets.
Matrix_sample_names_NI1905 <- as.matrix(csv_NI1905[,"barcodes"])
Matrix_sample_names_TopN_CpGs <- as.matrix(colnames(TopN_CpGs))
common_sample_names<-intersect(Matrix_sample_names_NI1905,Matrix_sample_names_TopN_CpGs)
csv_NI1905 <- csv_NI1905 %>% filter(barcodes %in% common_sample_names)
TopN_CpGs <- TopN_CpGs[, common_sample_names, drop = FALSE]
head(TopN_CpGs[,1:3],n=2)
dim(TopN_CpGs)
## [1] 5000 648
dim(csv_NI1905)
## [1] 648 23
Merge these two datasets and tored into “merged_df”
trans_TopN_CpGs<-t(TopN_CpGs)
# Check the total length of the rownames
# Recall that the sample name have been matched and both of them don't have duplicates
# Now, order the rownames and bind them together. This can make sure that the merged data frame created by these two data frame correctly matched together.
trans_TopN_CpGs_ordered<-trans_TopN_CpGs[order(rownames(trans_TopN_CpGs)),]
csv_NI1905_ordered<-csv_NI1905[order(rownames(csv_NI1905)),]
print("The rownames matchs in order:")
## [1] "The rownames matchs in order:"
check_1 = length(rownames(csv_NI1905_ordered))
check_2 = sum(rownames(csv_NI1905_ordered)==rownames(trans_TopN_CpGs_ordered))
print(check_1==check_2)
## [1] TRUE
merged_df_raw<-cbind(trans_TopN_CpGs_ordered,csv_NI1905_ordered)
phenotic_features_RAW<-colnames(csv_NI1905)
print(phenotic_features_RAW)
## [1] "barcodes" "RID.a" "prop.B" "prop.NK" "prop.CD4T" "prop.CD8T" "prop.Mono" "prop.Neutro" "prop.Eosino" "DX" "age.now" "PTGENDER" "ABETA" "TAU"
## [15] "PTAU" "PC1" "PC2" "PC3" "ageGroup" "ageGroupsq" "DX_num" "uniqueID" "Horvath"
phenoticPart_RAW <- merged_df_raw[,phenotic_features_RAW]
dim(phenoticPart_RAW)
## [1] 648 23
head(phenoticPart_RAW)
head(merged_df_raw[,1:3])
merged_df<-merged_df_raw
head(colnames(merged_df))
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
The name of feature CpGs could be called by: “featureName_CpGs”
featureName_CpGs<-rownames(TopN_CpGs)
length(featureName_CpGs)
## [1] 5000
head(featureName_CpGs)
## [1] "cg08223187" "cg15794987" "cg04821830" "cg24629711" "cg17380855" "cg10360725"
clean_merged_df<-merged_df
missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## ABETA TAU PTAU
## 109 109 109
Choose the method we want the data apply. The output dataset name is “clean_merged_df”.
# GO INPUT Session find "Impute_NA_FLAG_NUM":
# if we want to impute the NA with Mean , then let "Impute_NA_FLAG_NUM=1"
# if we want to impute the NA with KNN method , then let "Impute_NA_FLAG_NUM=2"
Impute_NA_FLAG = Impute_NA_FLAG_NUM
if (Impute_NA_FLAG == 1){
clean_merged_df_imputed_mean<-clean_merged_df
mean_ABETA_rmNA <- mean(clean_merged_df$ABETA, na.rm = TRUE)
clean_merged_df_imputed_mean$ABETA[
is.na(clean_merged_df_imputed_mean$ABETA)] <- mean_ABETA_rmNA
mean_TAU_rmNA <- mean(clean_merged_df$TAU, na.rm = TRUE)
clean_merged_df_imputed_mean$TAU[
is.na(clean_merged_df_imputed_mean$TAU)] <- mean_TAU_rmNA
mean_PTAU_rmNA <- mean(clean_merged_df$PTAU, na.rm = TRUE)
clean_merged_df_imputed_mean$PTAU[
is.na(clean_merged_df_imputed_mean$PTAU)] <- mean_PTAU_rmNA
clean_merged_df = clean_merged_df_imputed_mean
}
library(VIM)
if (Impute_NA_FLAG == 2){
df_imputed_KNN <- kNN(merged_df, k = 5)
imputed_summary <- colSums(df_imputed_KNN[, grep("_imp", names(df_imputed_KNN))])
print(imputed_summary[imputed_summary > 0])
clean_merged_df<-df_imputed_KNN[, -grep("_imp", names(df_imputed_KNN))]
}
missing_val_cols <- colnames(clean_merged_df)[colSums(is.na(clean_merged_df)) > 0]
colSums(is.na(clean_merged_df))[missing_val_cols]
## named numeric(0)
Choose the method we want to use
# GO INPUT Session find "METHOD_FEATURE_FLAG_NUM":
# if we want to use 3 class classification , then let "METHOD_FEATURE_FLAG_NUM=1"
# if we want to use PCA method , then let "METHOD_FEATURE_FLAG_NUM=2"
# if we want to use 2 class classification , then let "METHOD_FEATURE_FLAG_NUM=3"
METHOD_FEATURE_FLAG = METHOD_FEATURE_FLAG_NUM
if (METHOD_FEATURE_FLAG == 1){
df_fs_method1 <- clean_merged_df
}
if(METHOD_FEATURE_FLAG == 1){
phenotic_features_m1<-c("DX","age.now","PTGENDER",
"PC1","PC2","PC3")
pickedFeatureName_m1<-c(phenotic_features_m1,featureName_CpGs)
df_fs_method1<-clean_merged_df[,pickedFeatureName_m1]
df_fs_method1$DX<-as.factor(df_fs_method1$DX)
df_fs_method1$PTGENDER<-as.factor(df_fs_method1$PTGENDER)
head(df_fs_method1[,1:5],n=3)
dim(df_fs_method1)
}
if(METHOD_FEATURE_FLAG == 1){
dim(df_fs_method1)
}
Create contrast matrix for comparing CN vs Dementia vs MCI
if(METHOD_FEATURE_FLAG == 1){
pheno_data_m1 <- df_fs_method1[,phenotic_features_m1]
head(pheno_data_m1[,1:5],n=3)
pheno_data_m1$DX <- factor(pheno_data_m1$DX, levels = c("CN", "MCI", "Dementia"))
design_m1 <- model.matrix(~ 0 + DX + age.now + PTGENDER + PC1 + PC2 + PC3,
data = pheno_data_m1)
colnames(design_m1)[colnames(design_m1) == "DXCN"] <- "CN"
colnames(design_m1)[colnames(design_m1) == "DXDementia"] <- "Dementia"
colnames(design_m1)[colnames(design_m1) == "DXMCI"] <- "MCI"
head(design_m1)
cpg_matrix_m1 <- t(as.matrix(df_fs_method1[, featureName_CpGs]))
fit_m1 <- lmFit(cpg_matrix_m1, design_m1)
}
if(METHOD_FEATURE_FLAG == 1){
# for here, we have three labels. The contrasts to compare groups will be:
contrast_matrix_m1 <- makeContrasts(
MCI_vs_CN = MCI - CN,
Dementia_vs_CN = Dementia - CN,
Dementia_vs_MCI = Dementia - MCI,
levels = design_m1
)
fit2_m1 <- contrasts.fit(fit_m1, contrast_matrix_m1)
fit2_m1 <- eBayes(fit2_m1)
topTable(fit2_m1, coef = "MCI_vs_CN")
topTable(fit2_m1, coef = "Dementia_vs_CN")
topTable(fit2_m1, coef = "Dementia_vs_MCI")
summary_results_m1 <- decideTests(fit2_m1,method = "nestedF", adjust.method = "none", p.value = 0.05)
table(summary_results_m1)
}
if(METHOD_FEATURE_FLAG == 1){
significant_dmp_filter_m1 <- summary_results_m1 != 0
significant_cpgs_m1_DMP <- unique(rownames(summary_results_m1)[
apply(significant_dmp_filter_m1, 1, any)])
print(paste("The significant CpGs after DMP are:",
paste(significant_cpgs_m1_DMP, collapse = ", ")))
print(paste("Length of CpGs after DMP:",
length(significant_cpgs_m1_DMP)))
pickedFeatureName_m1_afterDMP<-c(phenotic_features_m1,significant_cpgs_m1_DMP)
df_fs_method1<-df_fs_method1[,pickedFeatureName_m1_afterDMP]
dim(df_fs_method1)
}
if(METHOD_FEATURE_FLAG == 1){
library(recipes)
df_picked <- df_fs_method1
rec <- recipe(DX ~ ., data = df_picked) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked)
processed_data_m1 <- bake(rec_prep, new_data = df_picked)
dim(processed_data_m1)
processed_data_m1_df<-as.data.frame(processed_data_m1)
rownames(processed_data_m1_df)<-rownames(df_picked)
}
if(METHOD_FEATURE_FLAG == 1){
AfterProcess_FeatureName_m1<-colnames(processed_data_m1)
head(AfterProcess_FeatureName_m1)
tail(AfterProcess_FeatureName_m1)
}
if(METHOD_FEATURE_FLAG == 1){
head(processed_data_m1[,1:5])
}
if(METHOD_FEATURE_FLAG == 1){
lastColumn_NUM<-dim(processed_data_m1)[2]
last5Column_NUM<-lastColumn_NUM-5
head(processed_data_m1[,last5Column_NUM :lastColumn_NUM])
}
if(METHOD_FEATURE_FLAG == 2){
bloodPropFeatureName<-c("RID.a","prop.B","prop.NK",
"prop.CD4T","prop.CD8T","prop.Mono",
"prop.Neutro","prop.Eosino")
pickedFeatureName_m2<-c("DX","age.now",
"PTGENDER",bloodPropFeatureName,
"ABETA","TAU","PTAU",featureName_CpGs)
df_fs_method2<-clean_merged_df[,pickedFeatureName_m2]
}
if(METHOD_FEATURE_FLAG == 2){
library(recipes)
rec <- recipe(DX ~ ., data = df_fs_method2) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_fs_method2)
processed_data_m2 <- bake(rec_prep, new_data = df_fs_method2)
dim(processed_data_m2)
}
if(METHOD_FEATURE_FLAG == 2){
X_df_m2<-subset(processed_data_m2,select = -DX)
Y_df_m2<-processed_data_m2$DX
pca_result <- prcomp(X_df_m2, center = TRUE, scale. = TRUE)
summary(pca_result)
screeplot(pca_result,type="lines")
}
if(METHOD_FEATURE_FLAG == 2){
PCA_component_threshold<-0.7
}
if(METHOD_FEATURE_FLAG == 2){
library(caret)
preproc<-preProcess(X_df_m2,method="pca",
thresh = PCA_component_threshold)
X_df_m2_transformed_PCA <- predict(preproc,X_df_m2)
data_processed_PCA<-data.frame(X_df_m2_transformed_PCA,Y_df_m2)
colnames(data_processed_PCA)[
which(colnames(data_processed_PCA)=="Y_df_m2")]<-"DX"
head(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 2){
processed_data_m2<-data_processed_PCA
AfterProcess_FeatureName_m2<-colnames(data_processed_PCA)
}
if(METHOD_FEATURE_FLAG == 3){
df_fs_method3<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 3){
phenotic_features_m3<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m3<-c(phenotic_features_m3,featureName_CpGs)
df_picked_m3<-df_fs_method3[,pickedFeatureName_m3]
df_picked_m3$DX<-as.factor(df_picked_m3$DX)
df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
head(df_picked_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
dim(df_picked_m3)
}
if(METHOD_FEATURE_FLAG == 3){
df_picked_m3<-df_picked_m3 %>% mutate(
DX = ifelse(DX == "CN", "CN",ifelse(DX
%in% c("MCI","Dementia"),"CI",NA)))
df_picked_m3$DX<-as.factor(df_picked_m3$DX)
df_picked_m3$PTGENDER<-as.factor(df_picked_m3$PTGENDER)
head(df_picked_m3[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
pheno_data_m3 <- df_picked_m3[,phenotic_features_m3]
head(pheno_data_m3[,1:5],n=3)
design_m3 <- model.matrix(~0 + .,data=pheno_data_m3)
colnames(design_m3)[colnames(design_m3) == "DXCN"] <- "CN"
colnames(design_m3)[colnames(design_m3) == "DXCI"] <- "CI"
head(design_m3)
beta_values_m3 <- t(as.matrix(df_fs_method3[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 3, we focus on two groups, one contrast of interest.
if(METHOD_FEATURE_FLAG == 3){
fit_m3 <- lmFit(beta_values_m3, design_m3)
head(fit_m3$coefficients)
contrast.matrix <- makeContrasts(CI - CN, levels = design_m3)
fit2_m3 <- contrasts.fit(fit_m3, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m3 <- eBayes(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
decideTests(fit2_m3)
}
if(METHOD_FEATURE_FLAG == 3){
dmp_results_m3_try1 <- decideTests(
fit2_m3, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m3_try1)
}
if(METHOD_FEATURE_FLAG == 3){
# Identify DMPs, we will use this one:
dmp_results_m3 <- decideTests(
fit2_m3, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
significant_dmp_filter <- dmp_results_m3 != 0
significant_cpgs_m3_DMP <- rownames(dmp_results_m3)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m3_afterDMP<-c(phenotic_features_m3,significant_cpgs_m3_DMP)
df_picked_m3<-df_picked_m3[,pickedFeatureName_m3_afterDMP]
dim(df_picked_m3)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 3){
full_results_m3 <- topTable(fit2_m3, number=Inf)
full_results_m3 <- tibble::rownames_to_column(full_results_m3,"ID")
head(full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
sorted_full_results_m3 <- full_results_m3[
order(full_results_m3$logFC, decreasing = TRUE), ]
head(sorted_full_results_m3)
}
if(METHOD_FEATURE_FLAG == 3){
library(ggplot2)
ggplot(full_results_m3,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 3){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m3 <- full_results_m3 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m3, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 3){
ggplot(full_results_m3,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 3){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m3 <- full_results_m3 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m3,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 3){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m3) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m3)
processed_data_m3 <- bake(rec_prep, new_data = df_picked_m3)
processed_data_m3_df <- as.data.frame(processed_data_m3)
rownames(processed_data_m3_df) <- rownames(df_picked_m3)
dim(processed_data_m3)
}
if(METHOD_FEATURE_FLAG == 3){
AfterProcess_FeatureName_m3<-colnames(processed_data_m3)
head(AfterProcess_FeatureName_m3)
tail(AfterProcess_FeatureName_m3)
}
if(METHOD_FEATURE_FLAG == 3){
levels(df_picked_m3$DX)
}
if(METHOD_FEATURE_FLAG == 3){
head(processed_data_m3[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 3){
lastColumn_NUM_m3<-dim(processed_data_m3)[2]
last5Column_NUM_m3<-lastColumn_NUM_m3-5
head(processed_data_m3[,last5Column_NUM_m3 :lastColumn_NUM_m3])
}
if(METHOD_FEATURE_FLAG == 3){
levels(processed_data_m3$DX)
}
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 4){
df_fs_method4<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 4){
phenotic_features_m4<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m4<-c(phenotic_features_m4,featureName_CpGs)
df_picked_m4<-df_fs_method4[,pickedFeatureName_m4]
df_picked_m4$DX<-as.factor(df_picked_m4$DX)
df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
head(df_picked_m4[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
dim(df_picked_m4)
}
if(METHOD_FEATURE_FLAG == 4){
df_picked_m4<-df_picked_m4 %>% filter(DX != "MCI") %>% droplevels()
df_picked_m4$DX<-as.factor(df_picked_m4$DX)
df_picked_m4$PTGENDER<-as.factor(df_picked_m4$PTGENDER)
head(df_picked_m4[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 4){
print(dim(df_picked_m4))
print(table(df_picked_m4$DX))
}
if(METHOD_FEATURE_FLAG == 4){
df_fs_method4 <- df_fs_method4 %>% filter(DX != "MCI") %>% droplevels()
df_fs_method4$DX<-as.factor(df_fs_method4$DX)
print(head(df_fs_method4))
print(dim(df_fs_method4))
}
if(METHOD_FEATURE_FLAG == 4){
pheno_data_m4 <- df_picked_m4[,phenotic_features_m4]
print(head(pheno_data_m4[,1:5],n=3))
design_m4 <- model.matrix(~0 + .,data=pheno_data_m4)
colnames(design_m4)[colnames(design_m4) == "DXCN"] <- "CN"
colnames(design_m4)[colnames(design_m4) == "DXDementia"] <- "Dementia"
print(head(design_m4))
beta_values_m4 <- t(as.matrix(df_fs_method4[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 4, we focus on two groups (CN and Demantia), one contrast of interest.
if(METHOD_FEATURE_FLAG == 4){
fit_m4 <- lmFit(beta_values_m4, design_m4)
head(fit_m4$coefficients)
contrast.matrix <- makeContrasts(Dementia - CN, levels = design_m4)
fit2_m4 <- contrasts.fit(fit_m4, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m4 <- eBayes(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
decideTests(fit2_m4)
}
if(METHOD_FEATURE_FLAG == 4){
dmp_results_m4_try1 <- decideTests(
fit2_m4, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m4_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 4){
# Identify DMPs, we will use this one:
dmp_results_m4 <- decideTests(
fit2_m4, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
significant_dmp_filter <- dmp_results_m4 != 0
significant_cpgs_m4_DMP <- rownames(dmp_results_m4)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m4_afterDMP<-c(phenotic_features_m4,significant_cpgs_m4_DMP)
df_picked_m4<-df_picked_m4[,pickedFeatureName_m4_afterDMP]
dim(df_picked_m4)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 4){
full_results_m4 <- topTable(fit2_m4, number=Inf)
full_results_m4 <- tibble::rownames_to_column(full_results_m4,"ID")
head(full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
sorted_full_results_m4 <- full_results_m4[
order(full_results_m4$logFC, decreasing = TRUE), ]
head(sorted_full_results_m4)
}
if(METHOD_FEATURE_FLAG == 4){
library(ggplot2)
ggplot(full_results_m4,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 4){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m4 <- full_results_m4 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m4, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 4){
ggplot(full_results_m4,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 4){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m4 <- full_results_m4 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m4,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 4){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m4) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m4)
processed_data_m4 <- bake(rec_prep, new_data = df_picked_m4)
processed_data_m4_df <- as.data.frame(processed_data_m4)
rownames(processed_data_m4_df) <- rownames(df_picked_m4)
print(dim(processed_data_m4))
}
if(METHOD_FEATURE_FLAG == 4){
AfterProcess_FeatureName_m4<-colnames(processed_data_m4)
print(length(AfterProcess_FeatureName_m4))
head(AfterProcess_FeatureName_m4)
tail(AfterProcess_FeatureName_m4)
}
if(METHOD_FEATURE_FLAG == 4){
levels(df_picked_m4$DX)
}
if(METHOD_FEATURE_FLAG == 4){
lastColumn_NUM_m4<-dim(processed_data_m4)[2]
last5Column_NUM_m4<-lastColumn_NUM_m4-5
head(processed_data_m4[,last5Column_NUM_m4 :lastColumn_NUM_m4])
}
if(METHOD_FEATURE_FLAG == 4){
print(levels(processed_data_m4$DX))
print(dim(processed_data_m4))
}
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 5){
df_fs_method5<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 5){
phenotic_features_m5<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m5<-c(phenotic_features_m5,featureName_CpGs)
df_picked_m5<-df_fs_method5[,pickedFeatureName_m5]
df_picked_m5$DX<-as.factor(df_picked_m5$DX)
df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
head(df_picked_m5[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
dim(df_picked_m5)
}
## [1] 648 5006
if(METHOD_FEATURE_FLAG == 5){
df_picked_m5<-df_picked_m5 %>% filter(DX != "Dementia") %>% droplevels()
df_picked_m5$DX<-as.factor(df_picked_m5$DX)
df_picked_m5$PTGENDER<-as.factor(df_picked_m5$PTGENDER)
head(df_picked_m5[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 5){
print(dim(df_picked_m5))
print(table(df_picked_m5$DX))
}
## [1] 554 5006
##
## CN MCI
## 221 333
if(METHOD_FEATURE_FLAG == 5){
df_fs_method5 <- df_fs_method5 %>% filter(DX != "Dementia") %>% droplevels()
df_fs_method5$DX<-as.factor(df_fs_method5$DX)
print(head(df_fs_method5))
print(dim(df_fs_method5))
}
## cg08223187 cg15794987 cg04821830 cg24629711 cg17380855 cg10360725 cg15365500 cg14323910 cg16464924 cg18993517 cg02483977 cg11430599 cg04431313 cg02006792 cg13749548 cg12395926 cg08736526
## cg20479209 cg16767700 cg12381531 cg03398919 cg26389052 cg13657582 cg19619592 cg14112997 cg03278611 cg13573375 cg17348244 cg18218696 cg04560372 cg05276972 cg14866535 cg01625212 cg10717149
## cg16055189 cg07336544 cg12695921 cg11008123 cg06193597 cg17002338 cg11956442 cg08146708 cg16377948 cg03670113 cg10376666 cg10832239 cg00540295 cg11957130 cg19697575 cg06792428 cg14755254
## cg02621446 cg12375161 cg26197191 cg01742836 cg21498547 cg22723460 cg03013609 cg16020483 cg21211688 cg06377160 cg03449867 cg11848150 cg24470466 cg25484147 cg14926231 cg23383154 cg27589309
## cg25013753 cg23854988 cg15066135 cg01105403 cg12957945 cg08896901 cg15825171 cg02095601 cg17237138 cg14179288 cg17349011 cg09214175 cg07249765 cg16228451 cg25710745 cg07173352 cg18391209
## cg20757478 cg27024127 cg05338731 cg03673787 cg26505478 cg04312783 cg00456343 cg13779327 cg11784298 cg18816122 cg23916408 cg22777560 cg21130926 cg12146221 cg12509997 cg19969624 cg20381372
## cg10542624 cg15546285 cg00143986 cg13910785 cg20336007 cg11013605 cg05257947 cg16412745 cg17174466 cg13127920 cg11834635 cg04637592 cg26697110 cg14549203 cg14281403 cg19311470 cg09533869
## cg19139664 cg25174111 cg08522473 cg06293782 cg05248234 cg05234269 cg18312428 cg08417382 cg16792234 cg06825310 cg03165426 cg16112880 cg07856408 cg24051749 cg12657416 cg14293999 cg19278212
## cg11857259 cg03900028 cg20089799 cg05554406 cg05646854 cg10632770 cg11888470 cg10238924 cg25099095 cg03640465 cg11738485 cg22444562 cg19377607 cg06136002 cg14307563 cg22337626 cg09209679
## cg06301252 cg23513018 cg13248811 cg04861073 cg25285484 cg17018422 cg07420417 cg02856402 cg19393008 cg12515659 cg18847598 cg27586797 cg05492172 cg08914944 cg02299007 cg18624102 cg08238375
## cg16119776 cg21209485 cg16523141 cg01858500 cg14651435 cg05900567 cg23840008 cg23905789 cg11331837 cg04843630 cg19416156 cg16514085 cg03555976 cg23696472 cg13774807 cg25322986 cg11897887
## cg13574945 cg11187460 cg06722086 cg27154731 cg02448536 cg23489384 cg13653328 cg03551377 cg02627872 cg14114910 cg06688803 cg09451339 cg17388779 cg11400162 cg03855028 cg13298953 cg11866463
## cg10463322 cg07844442 cg06961873 cg26864826 cg16886051 cg14564293 cg00722793 cg25134647 cg26392703 cg02863943 cg09993319 cg16019721 cg18088486 cg18266588 cg03967651 cg03098826 cg14871336
## cg06596097 cg27030917 cg25872828 cg00399895 cg17616590 cg13462557 cg02464073 cg12852500 cg04212500 cg02120552 cg09885502 cg09141759 cg04246708 cg27160931 cg20239921 cg19594252 cg14401833
## cg00257789 cg00439656 cg08332381 cg09935224 cg13195461 cg11231155 cg22851875 cg03407524 cg13130271 cg12935118 cg07437923 cg02491969 cg12012426 cg26495272 cg00727777 cg06728147 cg13910001
## cg14773256 cg24534774 cg06032337 cg09817414 cg12434901 cg18075755 cg23159970 cg10788927 cg14048271 cg05125667 cg27230769 cg24586205 cg10995422 cg16037139 cg09480417 cg05392160 cg08266644
## cg25069157 cg04540199 cg21294301 cg21587006 cg01553548 cg08397053 cg00572336 cg11128983 cg11846372 cg25985455 cg04145681 cg14597070 cg08708231 cg00258945 cg18443741 cg00999469 cg01608425
## cg21473514 cg17421046 cg10123377 cg17040924 cg02869694 cg01891583 cg04388792 cg11673471 cg03326823 cg02628879 cg12208638 cg27639199 cg09229960 cg03329597 cg15184353 cg01426558 cg04131969
## cg10890644 cg00567916 cg25543264 cg18285382 cg10422744 cg03796003 cg06405219 cg11796481 cg02617100 cg04680230 cg15520372 cg15946815 cg24851651 cg11431762 cg21046080 cg15567368 cg08775595
## cg08049519 cg11366142 cg06307915 cg25203245 cg04557130 cg22071943 cg17256465 cg11418607 cg27368331 cg24643105 cg03549208 cg27079096 cg00356335 cg13514836 cg02877261 cg12342501 cg11640767
## cg14889167 cg10117599 cg09972436 cg08717807 cg11144103 cg07203999 cg08331829 cg14117320 cg01132052 cg02898382 cg11294950 cg21795255 cg23813394 cg20455959 cg02658985 cg15033116 cg22564046
## cg10765459 cg13989295 cg13232075 cg19949776 cg01502466 cg11251367 cg18285337 cg06584796 cg10772532 cg10005146 cg17653352 cg17480035 cg14252149 cg06701026 cg20086657 cg13145550 cg12219587
## cg02533724 cg02080302 cg10528424 cg11265381 cg06091288 cg05809586 cg22274196 cg16390578 cg16405337 cg25465065 cg16788319 cg20976286 cg20536971 cg13655169 cg01280906 cg07837161 cg05656210
## cg11420142 cg05448209 cg09411910 cg08103988 cg04049787 cg25614253 cg05425577 cg06849002 cg10584449 cg21463262 cg04831745 cg10713875 cg04104977 cg02702444 cg13372276 cg23803868 cg19236675
## cg17544920 cg05305893 cg04971183 cg03834574 cg11716267 cg03690824 cg10978613 cg14838992 cg14811585 cg11787737 cg16268937 cg23954274 cg01185010 cg02658043 cg02100397 cg00814598 cg17004290
## cg01821635 cg09070882 cg14460215 cg15967058 cg20592836 cg21223191 cg08238319 cg07640670 cg11720573 cg20015269 cg16170636 cg20474581 cg01483826 cg04814784 cg10900271 cg24818939 cg04036196
## cg25879395 cg07128503 cg09708852 cg21881821 cg10006614 cg11010744 cg12040381 cg02794321 cg06737250 cg08096668 cg05593887 cg02183930 cg00366603 cg09872687 cg02323098 cg18339359 cg06041068
## cg16167565 cg08969352 cg16819848 cg15074403 cg12284872 cg21193926 cg01350803 cg09672255 cg16069065 cg23549329 cg20218135 cg17404449 cg02890259 cg15068593 cg06378142 cg10844498 cg26705599
## cg09481605 cg19843426 cg09403466 cg11826549 cg16894909 cg08283200 cg15421137 cg23767642 cg09994954 cg00045070 cg09331190 cg26621757 cg03664889 cg26820259 cg00704664 cg16405288 cg07227024
## cg03029566 cg14129832 cg14629397 cg18662228 cg08407901 cg06101792 cg00030117 cg09949906 cg01113595 cg15014361 cg02981209 cg16140565 cg11982081 cg24080129 cg12568536 cg05724451 cg04990378
## cg21783012 cg16995742 cg23764766 cg03122926 cg22237644 cg05023192 cg04577745 cg13038767 cg04073914 cg01016092 cg01600516 cg21692241 cg06273195 cg19154950 cg05890457 cg24506579 cg05712748
## cg21106100 cg02866897 cg02302183 cg20414709 cg03221390 cg01594260 cg20831491 cg16142759 cg21290550 cg13972557 cg24057558 cg04728936 cg22251955 cg09668218 cg05806018 cg00017157 cg11920595
## cg04998327 cg02507579 cg01378439 cg16702660 cg23414024 cg05321907 cg00924943 cg27187580 cg18673341 cg10985055 cg23680829 cg08059778 cg17671621 cg09281805 cg16655343 cg24844518 cg05528899
## cg04777551 cg14834300 cg24741068 cg07878625 cg10811640 cg20221740 cg10444583 cg27467876 cg03781170 cg24426788 cg19364778 cg07501029 cg16899969 cg18467790 cg13031029 cg24186901 cg13323954
## cg00466309 cg15909443 cg20139683 cg26212480 cg07178458 cg27362989 cg10750306 cg04605872 cg00944631 cg21242448 cg17342292 cg15602423 cg13079150 cg24136292 cg10923851 cg08210706 cg12413138
## cg13514954 cg26777760 cg15988569 cg09084244 cg12124890 cg01081395 cg11835797 cg01667144 cg15669985 cg11786587 cg09735782 cg20673407 cg24074981 cg19248407 cg11802689 cg05704942 cg09418475
## cg06616511 cg00723973 cg24846009 cg24738483 cg23712855 cg00845806 cg06699671 cg01104393 cg10032780 cg04798314 cg17327157 cg09985794 cg23162598 cg07615678 cg08408305 cg02956194 cg17171259
## cg26222222 cg22627029 cg24682077 cg08657228 cg03701469 cg09289202 cg15912814 cg24790801 cg22256607 cg23766887 cg14555881 cg07882838 cg01715830 cg06002867 cg23052585 cg27341708 cg11266396
## cg17149765 cg01118640 cg21838924 cg14799809 cg02907150 cg19405842 cg06545341 cg04665049 cg19577958 cg25888700 cg13054220 cg21566433 cg12466610 cg07861180 cg01543583 cg08014404 cg03327352
## cg15693668 cg09829645 cg04531182 cg11988372 cg24627956 cg23996696 cg16762802 cg16864708 cg22452543 cg02832783 cg03391801 cg11602758 cg07618759 cg17419220 cg02994943 cg13765957 cg20300784
## cg04982228 cg16955800 cg22276800 cg07909498 cg24309769 cg12650227 cg16044734 cg14609402 cg16733676 cg18455878 cg02368820 cg05452174 cg00033213 cg26365090 cg18523259 cg26567385 cg05580683
## cg06286533 cg16490124 cg01127608 cg14029254 cg26418790 cg02320265 cg05130642 cg27433479 cg04158402 cg12179578 cg13767616 cg09730272 cg22508145 cg03386329 cg20205188 cg11884832 cg04845852
## cg14599155 cg02735334 cg22411599 cg15491125 cg27288829 cg08250118 cg04462915 cg27153751 cg23499373 cg09476440 cg03330259 cg22336867 cg02195366 cg20223677 cg13078798 cg00256329 cg08880261
## cg26217827 cg05088151 cg16971657 cg06065495 cg03525554 cg25649515 cg00293330 cg26077133 cg14232344 cg04907664 cg14863642 cg08779649 cg07456472 cg16214670 cg12169700 cg09623464 cg06777697
## cg08084984 cg09866143 cg16802892 cg07465457 cg02679322 cg22169467 cg11585022 cg19935756 cg24328927 cg13885788 cg23936477 cg15828613 cg00553601 cg20017683 cg11758647 cg02173328 cg07269319
## cg08603678 cg19539986 cg00691480 cg05696779 cg24264679 cg16590821 cg03770889 cg01004097 cg20766178 cg11666326 cg03749159 cg25561557 cg04028540 cg01413796 cg13244998 cg13203541 cg06060457
## cg06042004 cg05200992 cg18827503 cg13564529 cg01201512 cg16932018 cg26069044 cg14877834 cg00913770 cg02819655 cg19214707 cg19504860 cg03334316 cg12920781 cg22862357 cg15450782 cg08458132
## cg05363576 cg12262617 cg03108651 cg04531698 cg08855111 cg25450321 cg22901347 cg00156497 cg03088219 cg10444733 cg04926069 cg06683092 cg15355235 cg27501723 cg02909570 cg00546757 cg05340866
## cg04305804 cg05238218 cg08477332 cg27488875 cg12682323 cg08439705 cg12602563 cg01215118 cg11404906 cg25721006 cg05799859 cg04238896 cg22710716 cg18615537 cg14629612 cg09148704 cg10140678
## cg13023833 cg20337029 cg09588531 cg11637006 cg10306192 cg25664050 cg16818568 cg17779733 cg08043513 cg26096605 cg09068955 cg25674027 cg05504986 cg07225509 cg12074150 cg05331763 cg15360451
## cg09460641 cg17365146 cg16533028 cg14749747 cg25508573 cg13072209 cg13371681 cg01359658 cg05767119 cg24836826 cg08963013 cg13038195 cg17738613 cg02575605 cg27277239 cg12340462 cg03466780
## cg04683516 cg10058204 cg11102724 cg12195446 cg02772171 cg07826058 cg00009523 cg25601713 cg04945432 cg05895137 cg04123498 cg14615128 cg05918715 cg03699194 cg27056740 cg02154276 cg18346634
## cg10987536 cg17251507 cg09650803 cg09134165 cg16241932 cg20502501 cg02333283 cg12565083 cg07800510 cg02216016 cg08782677 cg16357225 cg12155450 cg07456585 cg17186592 cg12240569 cg18500967
## cg19992190 cg16915302 cg06498495 cg16956806 cg21499289 cg00114625 cg10155537 cg15925199 cg15568074 cg04425994 cg24991845 cg22026089 cg13569207 cg23623404 cg00018261 cg11294350 cg06012903
## cg00878023 cg01835443 cg24638099 cg17265515 cg04066495 cg06746449 cg00399450 cg01824170 cg11418303 cg17292622 cg14192979 cg07258715 cg12120033 cg25342508 cg17906851 cg01933473 cg13522370
## cg00813093 cg09413252 cg16661157 cg23892028 cg18683228 cg14797147 cg16089727 cg15083522 cg18857647 cg09060772 cg17155524 cg13239134 cg09636756 cg01872988 cg16894263 cg22481673 cg08136432
## cg10767615 cg14295915 cg00767423 cg13883027 cg17509989 cg06099085 cg20485607 cg10511229 cg01055691 cg02940070 cg27129755 cg00046099 cg13120932 cg16771215 cg24506221 cg25624849 cg08041188
## cg03737947 cg26679884 cg08919780 cg01926326 cg08394893 cg27114706 cg18514595 cg27308738 cg00413734 cg11653314 cg05393861 cg17151385 cg19218082 cg22305850 cg27625131 cg26846609 cg03493899
## cg15613905 cg26038514 cg12036633 cg03544800 cg10482512 cg05400155 cg06237602 cg08516018 cg22221554 cg00254095 cg00836161 cg10701801 cg17178900 cg02761375 cg24479484 cg15744124 cg08537127
## cg07056506 cg21070081 cg15703000 cg09072865 cg00112256 cg11074323 cg13102742 cg19860695 cg13261753 cg26089705 cg15281606 cg06279067 cg18213661 cg11558795 cg04867412 cg26590106 cg17771423
## cg23554546 cg12214399 cg02823329 cg04453550 cg09998151 cg03075889 cg26428339 cg26882525 cg17035182 cg19300401 cg20597646 cg00295418 cg08890411 cg25656978 cg15441831 cg06844213 cg18561199
## cg05349513 cg15517438 cg05091477 cg09539170 cg21139150 cg13160852 cg23019589 cg19793163 cg17181941 cg25321762 cg25059696 cg15418221 cg25001484 cg21927991 cg14061270 cg02154924 cg13688351
## cg01941243 cg18557837 cg05134736 cg24460485 cg10942642 cg22557383 cg25026580 cg15395171 cg05977333 cg26709433 cg01835922 cg14195178 cg21072408 cg15977272 cg00274640 cg10590622 cg06334238
## cg18576044 cg22138998 cg12077809 cg26856631 cg07697459 cg16767880 cg03119308 cg05059349 cg18131458 cg00035449 cg14509777 cg00055165 cg23403836 cg02882301 cg17149911 cg21513542 cg00841008
## cg21388339 cg25205946 cg20749341 cg23733925 cg10818676 cg23919742 cg10507965 cg21358336 cg08902358 cg07145234 cg10415021 cg08479532 cg08570077 cg21714731 cg26565914 cg13978098 cg09780150
## cg01341801 cg00293269 cg00192980 cg19774683 cg06352616 cg07954607 cg08980509 cg20207108 cg19834421 cg13815695 cg00265812 cg26642936 cg25206026 cg14351440 cg04876534 cg08002427 cg17155577
## cg00980980 cg00161838 cg04888234 cg15195148 cg10835413 cg13081560 cg08977311 cg22405556 cg11209190 cg10523200 cg08096656 cg04371001 cg19274180 cg02814135 cg01156747 cg04596655 cg09350919
## cg15133953 cg13074018 cg09352925 cg12259892 cg14780448 cg01296877 cg06362582 cg13067096 cg01223071 cg11450075 cg12897690 cg24926791 cg14375582 cg13603318 cg20187719 cg01379313 cg17590101
## cg10786572 cg00810519 cg08327960 cg03842120 cg01500431 cg22109827 cg27481428 cg07210229 cg19723528 cg09022647 cg05680665 cg01188578 cg09312897 cg00680673 cg19168249 cg03639185 cg05383895
## cg23939077 cg07505631 cg20270941 cg09157251 cg25755428 cg13950578 cg26422465 cg27248959 cg11377625 cg19750824 cg02902672 cg09364373 cg13915481 cg06634917 cg05476522 cg20741235 cg12158214
## cg17723206 cg05935445 cg01251131 cg25997988 cg14152758 cg03600007 cg01201914 cg06389521 cg18828306 cg20543970 cg02901522 cg13530263 cg24748621 cg16596266 cg18021992 cg15388766 cg16211147
## cg19384241 cg11438323 cg03526459 cg20213329 cg03478816 cg18717600 cg00322820 cg11965913 cg26287822 cg10991108 cg16398051 cg16641060 cg16104636 cg27086157 cg17479100 cg19518539 cg26690407
## cg25225807 cg15535896 cg24398793 cg20859738 cg12262000 cg15932613 cg15229668 cg06139288 cg16081854 cg26354017 cg18698799 cg07446674 cg27093646 cg06055266 cg00631877 cg12501287 cg11723923
## cg12277627 cg00348031 cg17508549 cg14228103 cg16310958 cg06500073 cg01721300 cg05847731 cg10690713 cg24065597 cg06671703 cg18821122 cg02370566 cg20704148 cg08152721 cg14056849 cg13506281
## cg09430642 cg08514194 cg05095647 cg03655023 cg11308037 cg23022053 cg14113515 cg13017022 cg27494055 cg01462799 cg26824678 cg03050491 cg07684215 cg21634283 cg09307883 cg07498088 cg15727053
## cg24245216 cg12482297 cg03714923 cg08210468 cg19505129 cg00814186 cg14964115 cg15609861 cg16431836 cg01479916 cg05385718 cg05841700 cg06051619 cg06487085 cg00866176 cg12914114 cg10738003
## cg06673178 cg27592925 cg10596483 cg25977769 cg25712015 cg23412653 cg21934405 cg19321437 cg16756025 cg01978703 cg10738648 cg16579946 cg08138245 cg12026625 cg07386410 cg24996718 cg15570860
## cg20060160 cg15132295 cg01463110 cg15844450 cg21137943 cg23836570 cg15165694 cg02262167 cg24284539 cg22471695 cg03748372 cg08842642 cg20798066 cg10326673 cg09785377 cg15266057 cg26278987
## cg05374090 cg04487202 cg24018148 cg00474373 cg06427702 cg16536985 cg15255859 cg05308244 cg22807592 cg20370184 cg25940844 cg00835812 cg17965552 cg09456260 cg15555926 cg14544439 cg05961492
## cg07176285 cg10240906 cg15652532 cg08849813 cg01869765 cg22120018 cg10776061 cg06616857 cg05025374 cg26908356 cg10482495 cg03335173 cg02122327 cg12784167 cg01058588 cg05792312 cg21234342
## cg08076861 cg23947872 cg18190829 cg26628435 cg25427918 cg15633912 cg02495179 cg21203249 cg10829391 cg12386614 cg26263138 cg11524947 cg03871183 cg26011946 cg23951868 cg12614702 cg17363084
## cg09091181 cg04875706 cg25317262 cg02494911 cg27578568 cg22972806 cg05967787 cg04402345 cg02775175 cg14241748 cg03706056 cg26495595 cg19471911 cg00956039 cg20823859 cg14159672 cg21360798
## cg15950547 cg15907464 cg19680693 cg07158505 cg06813297 cg02078724 cg01407424 cg15618087 cg21854924 cg01957222 cg24730756 cg07533224 cg03635532 cg04242342 cg05373298 cg16908938 cg14127016
## cg09056691 cg22045528 cg07133434 cg27501007 cg07674503 cg06002687 cg06316758 cg07590402 cg02933448 cg22618269 cg18932686 cg11184697 cg17970282 cg16956665 cg17036062 cg08405463 cg21543103
## cg07262858 cg04712194 cg14970569 cg05707218 cg13905298 cg11894108 cg03192273 cg15034216 cg05427163 cg20981163 cg24009806 cg17196155 cg05782975 cg00905457 cg06297686 cg06779802 cg04520693
## cg03661789 cg07648454 cg12551908 cg22671798 cg10860619 cg00345083 cg00337921 cg17399684 cg09247979 cg27017735 cg04610028 cg01472026 cg03643559 cg07750402 cg07240846 cg06915915 cg12261681
## cg07796782 cg16637584 cg05730108 cg10122899 cg01871867 cg15211026 cg02890235 cg19998137 cg06972843 cg06576965 cg10818284 cg03224005 cg04481077 cg10923350 cg05034175 cg09342610 cg12898220
## cg06052372 cg02246922 cg20566384 cg13156574 cg08798116 cg03155755 cg04422742 cg18580559 cg07126775 cg12284142 cg16471877 cg18618432 cg00730761 cg22077592 cg06609793 cg16221895 cg07393670
## cg08673419 cg00259849 cg08848711 cg27083089 cg26159385 cg04481635 cg20549346 cg25436480 cg06483046 cg00729461 cg14780957 cg07632860 cg23443158 cg16194687 cg08797383 cg14720319 cg08916385
## cg11732753 cg02074316 cg04771146 cg22346540 cg15499467 cg05208607 cg24608181 cg26674826 cg04217946 cg04727458 cg19198567 cg13815872 cg24086348 cg00968488 cg24904436 cg12534577 cg04317640
## cg26128147 cg02550738 cg17396400 cg00011891 cg05407200 cg01008088 cg15865722 cg11583848 cg07946630 cg15835795 cg03966315 cg06055478 cg07791065 cg23603995 cg24232980 cg25374269 cg09146364
## cg00200463 cg02615131 cg20078646 cg11268585 cg13913990 cg06417478 cg13799572 cg20964965 cg24090628 cg26023405 cg05111645 cg00832270 cg17939448 cg12785025 cg23762217 cg06621919 cg18310072
## cg26584339 cg18037388 cg01876809 cg15654812 cg06955954 cg14908122 cg16576930 cg03172493 cg17382566 cg03944921 cg13682241 cg06864789 cg17853057 cg18958984 cg14655569 cg17616663 cg04861534
## cg19441529 cg11882358 cg08280054 cg01871025 cg00729708 cg15216357 cg04675919 cg05688478 cg06700506 cg25179313 cg11663691 cg09207137 cg04316537 cg07383357 cg00113623 cg12469381 cg07989438
## cg20305683 cg06236987 cg18386008 cg22893969 cg11091790 cg18124907 cg19495614 cg05941375 cg19787013 cg17588704 cg25644740 cg12403148 cg03993368 cg14227325 cg03721887 cg16918438 cg24559073
## cg19068385 cg20035294 cg08216425 cg09965404 cg16499140 cg20360416 cg04657146 cg11152253 cg13388618 cg10403109 cg11424828 cg10306780 cg15094228 cg12620265 cg18709904 cg22202169 cg00963467
## cg07056794 cg00762003 cg16361921 cg27224751 cg05037630 cg22776211 cg16638301 cg11154719 cg21953876 cg15211500 cg17699276 cg03756044 cg04675306 cg04528326 cg00139317 cg12716696 cg21016188
## cg00116709 cg11399582 cg27049594 cg01914365 cg15295200 cg04575501 cg16954525 cg02484732 cg01384686 cg14859874 cg00892228 cg16527629 cg19407410 cg16617830 cg16144436 cg22505202 cg00939409
## cg09978401 cg18414950 cg16794291 cg10643429 cg08952424 cg18395382 cg15410402 cg17825572 cg11458217 cg14194326 cg26983017 cg26981746 cg17355066 cg00271873 cg24859648 cg26822438 cg02921434
## cg17189724 cg24584738 cg19716713 cg15184869 cg10470368 cg05534333 cg23779047 cg16412513 cg08429705 cg24697097 cg20608847 cg18845598 cg27611887 cg27522357 cg02496423 cg12255092 cg20756026
## cg00602930 cg17008556 cg01081438 cg23374711 cg25673075 cg01777565 cg07167872 cg01733439 cg06144999 cg00051154 cg18403317 cg16178271 cg13115118 cg00675157 cg04234536 cg10369879 cg23595826
## cg09371091 cg02656016 cg01491428 cg03288751 cg00814218 cg20208613 cg06995503 cg26033510 cg15734257 cg25846190 cg23834765 cg14983172 cg16162930 cg04780373 cg12109978 cg18136963 cg15774752
## cg07304760 cg14804181 cg25911220 cg23184276 cg20904336 cg16403901 cg14258356 cg06031234 cg23804921 cg27395310 cg03374522 cg11185978 cg18805164 cg02165546 cg26640879 cg00478198 cg08569059
## cg10678429 cg08720028 cg01565322 cg01504940 cg00645049 cg19373347 cg05873820 cg20968048 cg15124400 cg10566479 cg06403901 cg20781383 cg25442267 cg26654770 cg16505502 cg06266461 cg12918445
## cg12313868 cg26201401 cg06264882 cg09856996 cg13387643 cg19513111 cg15460297 cg11425580 cg22274273 cg19225953 cg14361804 cg04181991 cg13204538 cg16402757 cg17547524 cg06223162 cg23277098
## cg23919845 cg04768387 cg20017124 cg13649400 cg19415746 cg08268047 cg14705391 cg02049865 cg14626875 cg17268094 cg01128042 cg19389973 cg22053855 cg01124926 cg11450947 cg06834235 cg07488092
## cg03570263 cg09438069 cg17639056 cg22304519 cg05079227 cg05380919 cg27558057 cg06013788 cg22969661 cg26444086 cg25692928 cg17635970 cg00078867 cg27069132 cg17839758 cg17061760 cg25140213
## cg03038395 cg08636328 cg08270148 cg17238522 cg20445038 cg21225796 cg05867245 cg10950297 cg14507637 cg16202259 cg06615444 cg19079513 cg03691313 cg19799454 cg03020684 cg03723481 cg01530521
## cg23364541 cg09059153 cg18102950 cg10122885 cg03403996 cg25977965 cg02621658 cg25165144 cg26682103 cg11546683 cg14071112 cg22984586 cg19235109 cg05836189 cg17628377 cg25826070 cg26937008
## cg22164912 cg26375010 cg18828303 cg03992069 cg23098789 cg21259115 cg15582794 cg08198851 cg25388952 cg15029183 cg24694833 cg15257930 cg19178509 cg26035071 cg26421947 cg18288715 cg06837403
## cg11872370 cg02285579 cg04550935 cg01476442 cg12709057 cg24347720 cg26146690 cg04327763 cg13862711 cg14267065 cg04255382 cg16032134 cg24853868 cg26563651 cg05891136 cg02451693 cg05109619
## cg23627980 cg14831665 cg07474670 cg14006678 cg04412904 cg00011200 cg23943944 cg21760862 cg10423996 cg23350716 cg12704708 cg13688687 cg01427108 cg10880252 cg21397839 cg12729177 cg23392381
## cg18107314 cg11991151 cg19415116 cg22867893 cg02217425 cg05522042 cg12953206 cg04509103 cg17457545 cg02171833 cg11227702 cg13058551 cg13740636 cg18584561 cg27284883 cg00380985 cg14007688
## cg02945674 cg18150287 cg26296371 cg20961245 cg17393140 cg08446187 cg00445202 cg07149083 cg26454172 cg25598710 cg08275242 cg08839358 cg12206353 cg14856563 cg16976875 cg13591052 cg07115108
## cg25645840 cg14314132 cg04841583 cg03905487 cg00279662 cg14137558 cg19810816 cg17758652 cg15314470 cg26161652 cg24315885 cg04263740 cg10999462 cg17279365 cg12333628 cg06922212 cg10080013
## cg04026379 cg01342901 cg22162835 cg01150227 cg01451645 cg24648384 cg08024471 cg08265308 cg05416337 cg26536949 cg09639108 cg08338641 cg09197234 cg02839725 cg23187802 cg04027004 cg10088372
## cg11969330 cg14168080 cg10723556 cg08600378 cg11164659 cg19821612 cg14542879 cg15771339 cg19738233 cg00967012 cg08625210 cg10721440 cg02274705 cg27160885 cg22715629 cg24503407 cg08461617
## cg10637509 cg25165659 cg06079963 cg03198009 cg16232867 cg05351360 cg14068184 cg07355270 cg17441733 cg03885028 cg05193149 cg26121752 cg19584075 cg01982279 cg22223709 cg07484678 cg04627110
## cg08960045 cg09746326 cg07795413 cg05161773 cg20679188 cg11169344 cg18689730 cg02630646 cg12855313 cg27286614 cg22646149 cg12744907 cg14764203 cg01207755 cg05818501 cg14051366 cg03504002
## cg25306893 cg14918074 cg14181112 cg15369199 cg05785344 cg06124141 cg23855802 cg04072009 cg01333616 cg26439324 cg08421632 cg08648877 cg03325394 cg23896353 cg17724121 cg08455905 cg14893161
## cg13211008 cg04960964 cg20913114 cg21415084 cg02932958 cg25576048 cg05141217 cg05966078 cg14942092 cg22402121 cg02171206 cg05611160 cg24863802 cg11705504 cg13946163 cg00962106 cg22287211
## cg14629010 cg14331362 cg20004147 cg08986950 cg10315562 cg14767338 cg00849191 cg08745107 cg12143138 cg04277055 cg24961286 cg22142142 cg06945800 cg04872051 cg19062189 cg16755189 cg11281291
## cg01650464 cg21792493 cg06110166 cg06476934 cg12374770 cg01758122 cg05579559 cg24437580 cg06684911 cg22681945 cg21634944 cg00347850 cg11949518 cg27527657 cg08199506 cg23834181 cg26941787
## cg16006841 cg07134368 cg01740135 cg17369140 cg24153901 cg11438287 cg16570885 cg17811452 cg11233153 cg08873063 cg05461361 cg24232370 cg13423887 cg16000638 cg01662749 cg14782559 cg02890812
## cg17240976 cg01555661 cg16051083 cg09729660 cg11286989 cg25897349 cg15775217 cg06048169 cg27470278 cg24139837 cg12298823 cg25790212 cg06624143 cg13466755 cg05239680 cg04645024 cg11072201
## cg08055002 cg01280698 cg09139047 cg01366378 cg22430708 cg26642774 cg10061320 cg04455999 cg16639627 cg07037055 cg06191872 cg13830619 cg03104298 cg04302178 cg11789991 cg02427933 cg01318188
## cg13182391 cg16814680 cg14516385 cg15117507 cg17748470 cg18182981 cg15132216 cg05450979 cg03057303 cg11044162 cg21414424 cg17386473 cg22774704 cg11291009 cg22933800 cg11314779 cg21205654
## cg07211915 cg10296238 cg21697769 cg21691076 cg25880954 cg15536552 cg13739190 cg13851368 cg12833414 cg17430903 cg09797202 cg02770249 cg26864304 cg09664314 cg07628886 cg01405303 cg04149024
## cg09227616 cg19353052 cg09834142 cg12488572 cg05872808 cg03191359 cg25529585 cg02865277 cg00979438 cg06264089 cg00421199 cg09886258 cg17503853 cg06083932 cg07747558 cg12543766 cg15958422
## cg07781082 cg26129200 cg25445671 cg08041448 cg07781090 cg17441804 cg15547764 cg10499451 cg00063608 cg17818432 cg09863391 cg05874912 cg12738079 cg02968327 cg07712165 cg10347326 cg10919053
## cg13574174 cg08925606 cg18858121 cg19360212 cg12623396 cg23113041 cg18452169 cg26896756 cg20022541 cg21201934 cg19854896 cg18756931 cg19056391 cg00243527 cg15138543 cg03874513 cg11648471
## cg12743416 cg00829575 cg09120722 cg07799180 cg09253663 cg06704717 cg01326421 cg11401796 cg11823448 cg07365741 cg08102564 cg05323542 cg08880082 cg01303569 cg16871435 cg23251359 cg23496593
## cg17284124 cg26251192 cg03359666 cg17122979 cg27244972 cg08496601 cg16181678 cg07215528 cg08108619 cg17217478 cg02079756 cg27070288 cg27450744 cg03651054 cg01212677 cg11857805 cg16775095
## cg16715186 cg12646252 cg26764761 cg25247689 cg22955899 cg09310980 cg13324220 cg14513804 cg06824156 cg00433220 cg21560722 cg15247483 cg01463139 cg07516457 cg02489327 cg23991947 cg11173636
## cg05749243 cg22682304 cg25129414 cg08900396 cg14918591 cg00696044 cg06393529 cg04512759 cg21765125 cg21284493 cg16655091 cg12521790 cg09518270 cg05475474 cg23247655 cg15668967 cg13452830
## cg08983668 cg22635523 cg25601709 cg06352538 cg12727431 cg16962612 cg25452717 cg25249362 cg08750459 cg06548479 cg18714913 cg11519740 cg21692140 cg09034259 cg19770253 cg16515238 cg05455372
## cg07806343 cg03250346 cg25528646 cg05958126 cg08118032 cg03726259 cg04298672 cg21442271 cg07822777 cg12738248 cg26342575 cg18140045 cg06614969 cg04218584 cg19791271 cg05291429 cg05935584
## cg22953237 cg11720358 cg17971895 cg17530337 cg06115838 cg05383619 cg06549249 cg17602481 cg13276615 cg01332299 cg09705401 cg16432908 cg04319046 cg17341969 cg09510698 cg12768975 cg06394109
## cg25243082 cg16836675 cg03821194 cg02567750 cg03900860 cg11902811 cg12908908 cg04493908 cg07216619 cg02283535 cg17253931 cg24686902 cg12134602 cg02487331 cg26457165 cg05995465 cg17664833
## cg01759889 cg01924074 cg00597445 cg19848641 cg00084271 cg11111131 cg08282969 cg10055097 cg16675926 cg11194545 cg04302300 cg07773740 cg02107461 cg24883219 cg19718903 cg06804873 cg01013522
## cg00914218 cg12526470 cg16874089 cg08324927 cg13342259 cg27485646 cg02627240 cg19859323 cg13077484 cg20673830 cg09079173 cg26220594 cg24401557 cg14772068 cg01816891 cg03906572 cg26301245
## cg15559823 cg09535760 cg09579899 cg06225639 cg12129080 cg14002365 cg01720007 cg05803370 cg04396360 cg07166908 cg07507339 cg17824401 cg02605540 cg23913313 cg05084668 cg00616572 cg16245086
## cg10133369 cg13120260 cg04456492 cg25645693 cg21658164 cg03605032 cg08788093 cg07700317 cg07301957 cg03890843 cg20361843 cg14238671 cg05070493 cg06738063 cg26119746 cg15930598 cg08024264
## cg23365293 cg03812172 cg02316445 cg16817435 cg06538336 cg11261447 cg03617221 cg13077366 cg03370193 cg13117792 cg05096415 cg17995340 cg21019788 cg12154943 cg05929129 cg25514427 cg21785054
## cg04292836 cg08348649 cg17330938 cg03038914 cg07951602 cg17056069 cg26059639 cg16594779 cg24699914 cg01236565 cg02389264 cg25415674 cg10286673 cg26708920 cg07572984 cg14544514 cg03484420
## cg02168270 cg22917366 cg15586958 cg26864661 cg19780831 cg16243644 cg00201142 cg14378789 cg19707653 cg11607219 cg02619116 cg14369777 cg13516940 cg12624040 cg01153376 cg04882216 cg01225004
## cg05594230 cg02937794 cg18751375 cg08221357 cg21149357 cg14358839 cg10892068 cg23681001 cg23954206 cg14121685 cg09100196 cg08188907 cg24664551 cg06915321 cg17095460 cg15250633 cg09284209
## cg15600437 cg27609342 cg12064531 cg20662859 cg27300573 cg15002478 cg07530027 cg08693140 cg08669168 cg23352245 cg03167407 cg03324099 cg05213316 cg04924408 cg05091873 cg10662047 cg12016309
## cg22569627 cg09738386 cg05452887 cg27207144 cg13033971 cg18983709 cg16397968 cg23192736 cg22857134 cg20713636 cg23119380 cg02179438 cg20272343 cg06354054 cg00718752 cg07824128 cg04791822
## cg26590811 cg10691647 cg12322605 cg19797013 cg12077433 cg19238394 cg22307470 cg01387905 cg04508606 cg05365121 cg22787186 cg23727079 cg26801383 cg16531277 cg11851349 cg02295504 cg00553365
## cg18065464 cg01430241 cg17283620 cg06134910 cg11870452 cg09854620 cg21159768 cg16191297 cg05093818 cg11573182 cg11186706 cg16567137 cg24861747 cg00981879 cg04497820 cg15532640 cg15535487
## cg01414116 cg24832428 cg22504140 cg26936989 cg02510708 cg25692732 cg00939438 cg13928473 cg07210774 cg16852920 cg05092371 cg05061041 cg25790081 cg10780707 cg10050962 cg14247154 cg27353825
## cg19512141 cg22542451 cg02032561 cg21864829 cg15465836 cg16788857 cg16429499 cg15044932 cg16764296 cg17848104 cg10701746 cg00332268 cg15715844 cg07979524 cg12981362 cg11229715 cg25943986
## cg01991530 cg09636905 cg27015302 cg03111560 cg19332075 cg16180556 cg10274815 cg14911689 cg06378561 cg25929399 cg17386240 cg17917970 cg18786623 cg14737574 cg11047442 cg11540596 cg20707527
## cg04546413 cg26734875 cg17741448 cg18239511 cg22666875 cg06579087 cg13177959 cg19635884 cg04524851 cg16742675 cg09687597 cg11638117 cg12471283 cg11400068 cg06675417 cg13115455 cg06734157
## cg00534215 cg11673013 cg20767561 cg04156077 cg11727304 cg03187614 cg08624915 cg03828160 cg13825033 cg24114730 cg04467639 cg05176970 cg16458822 cg03276920 cg15876198 cg08950364 cg26764972
## cg20077602 cg26380710 cg23177161 cg17763566 cg14553323 cg25492195 cg08551408 cg15637874 cg16510200 cg21127593 cg13744306 cg07428182 cg24801230 cg04850148 cg00648024 cg21035907 cg20684491
## cg24417798 cg16423096 cg09352518 cg25150572 cg02891314 cg15391239 cg12449104 cg24017974 cg22111694 cg22823009 cg02401352 cg22459517 cg20372745 cg23660678 cg26813483 cg15579650 cg23541304
## cg18424635 cg01388693 cg14859618 cg13240932 cg06612594 cg18932722 cg04376185 cg07581973 cg25951717 cg26308359 cg09986921 cg14303457 cg17623720 cg07761942 cg06441867 cg07130381 cg18882436
## cg10983111 cg20442191 cg22712681 cg16723510 cg21787089 cg00859877 cg21681732 cg05875700 cg14992527 cg10981178 cg00532122 cg15975960 cg26371957 cg02622647 cg05116966 cg19616372 cg01802772
## cg14651363 cg25416774 cg17811760 cg05947181 cg00811210 cg08159412 cg26846076 cg10363118 cg10681981 cg18253743 cg01828474 cg02668233 cg09732868 cg11973981 cg01562833 cg02095003 cg24533526
## cg03272642 cg26786615 cg11791078 cg16999994 cg11706829 cg26261358 cg17600943 cg16529483 cg02356645 cg16866567 cg16119423 cg05971102 cg26076233 cg14465143 cg24194941 cg19010939 cg12156950
## cg00146240 cg01716666 cg11369993 cg03681484 cg22645859 cg06330797 cg17920646 cg24307368 cg22853855 cg04497611 cg15627180 cg23564471 cg09780996 cg12861974 cg24697433 cg18110333 cg06012621
## cg17197278 cg03825574 cg26810336 cg14990368 cg06018273 cg05032903 cg14193607 cg07187289 cg18949721 cg20418101 cg04346428 cg14505657 cg12417704 cg02372404 cg07480955 cg06152434 cg23513244
## cg11021362 cg07990395 cg05223760 cg14279361 cg13635701 cg16340188 cg23154024 cg21081239 cg08853008 cg04971651 cg11124135 cg26198148 cg16120147 cg16029533 cg04493740 cg18307604 cg16200242
## cg22682567 cg16556401 cg04821917 cg11978593 cg23759693 cg09411587 cg06111581 cg24783624 cg14928378 cg17234414 cg26756979 cg11082424 cg10149013 cg08279515 cg15971518 cg01878430 cg12544391
## cg12556569 cg10869581 cg04024675 cg04754076 cg01269359 cg04649587 cg11818589 cg06950937 cg00150363 cg09012881 cg12962913 cg22963378 cg16314146 cg05225083 cg02656474 cg24156746 cg05813498
## cg06558952 cg13023205 cg26746069 cg17444608 cg01362389 cg05237503 cg16288713 cg08055597 cg14642832 cg13016553 cg13663390 cg15488542 cg20979384 cg22484503 cg15384497 cg15129815 cg23694557
## cg08298085 cg20495737 cg08923376 cg19840763 cg27123903 cg08629394 cg01003197 cg18707028 cg03485872 cg06068545 cg01186276 cg01857253 cg18827179 cg20416767 cg06796204 cg07309821 cg21966453
## cg00272795 cg00173771 cg01176653 cg09255886 cg22931151 cg16734817 cg01809408 cg11173002 cg10388998 cg21557668 cg10463108 cg15290312 cg11601920 cg14532717 cg13226272 cg17058724 cg20498086
## cg16343465 cg09737095 cg22850802 cg04622888 cg04924736 cg23974730 cg18399551 cg19268168 cg13822691 cg16247409 cg20670088 cg27112414 cg02691623 cg22151131 cg09019154 cg00231519 cg26007606
## cg20641280 cg06334689 cg13655986 cg05207724 cg17831869 cg00028022 cg06190612 cg04282082 cg02834750 cg01771673 cg12073886 cg27483305 cg05305760 cg18989810 cg05192017 cg09790289 cg02682989
## cg00018245 cg17223593 cg08967584 cg06489418 cg03628603 cg16645815 cg00917018 cg12058262 cg24780352 cg22897878 cg04533591 cg07284314 cg04734394 cg14380863 cg11337025 cg15396877 cg25366315
## cg24536689 cg15463454 cg23503517 cg03803585 cg26215003 cg15748104 cg06573709 cg01406203 cg00139033 cg03705894 cg03071582 cg02637222 cg13009425 cg26739327 cg13042103 cg15324534 cg01471923
## cg04003990 cg08973646 cg27513289 cg16474696 cg24088508 cg03876418 cg17571782 cg10848980 cg08253809 cg08584917 cg06394820 cg07223206 cg17131279 cg07870920 cg14460470 cg26811602 cg25933726
## cg04586456 cg10201390 cg01267392 cg03940883 cg25363292 cg17052675 cg27292235 cg23161429 cg16466260 cg25858008 cg16761754 cg07138269 cg17835180 cg06048710 cg07584620 cg05803913 cg01943931
## cg16490805 cg24585803 cg04740028 cg01891172 cg13741548 cg13766816 cg26081710 cg01234346 cg02351219 cg00587941 cg08841290 cg15050093 cg10102167 cg02217565 cg07867687 cg20033591 cg17356557
## cg12472218 cg09557047 cg13603288 cg12213037 cg13080267 cg18091964 cg05679079 cg17195879 cg16318983 cg26323797 cg08147831 cg02553463 cg16626088 cg01873087 cg15199886 cg04613734 cg04402486
## cg12443477 cg17294479 cg09720515 cg07223177 cg18897294 cg02056550 cg25758034 cg03487706 cg18112782 cg02460443 cg10092377 cg26930498 cg09549987 cg22112152 cg19301366 cg17788031 cg22946888
## cg16569309 cg13110951 cg23658987 cg10601372 cg00324979 cg14167033 cg00603890 cg15019001 cg03363289 cg05006879 cg00819121 cg25938960 cg23923019 cg14453947 cg24676461 cg00458505 cg10091792
## cg21028319 cg11062466 cg25123033 cg16373938 cg23299576 cg06762527 cg10974412 cg23722438 cg08784874 cg25259265 cg22449896 cg17224287 cg24872173 cg10387963 cg25291037 cg12464216 cg23718917
## cg05899999 cg21507367 cg14924512 cg08697944 cg16779438 cg07525313 cg02042142 cg14710850 cg02355809 cg07924176 cg00923506 cg08695223 cg20792978 cg16046605 cg12293347 cg14520892 cg26751304
## cg13117487 cg14621900 cg12031275 cg05441864 cg06118351 cg00236261 cg11019791 cg02773151 cg10144558 cg16197788 cg06701726 cg12614178 cg12166502 cg22191603 cg11734718 cg17025908 cg03202526
## cg23543766 cg02096220 cg17477997 cg12308308 cg20074774 cg19180514 cg23121114 cg19192106 cg23248424 cg06237805 cg02148711 cg23892645 cg24239165 cg05732866 cg06174194 cg01910713 cg07684647
## cg06979386 cg11664825 cg18845053 cg01981678 cg21812850 cg07936689 cg17503814 cg07021532 cg27381383 cg23832225 cg16783314 cg09993718 cg19415339 cg01074083 cg01345087 cg25766534 cg17628491
## cg11610546 cg08605899 cg08332163 cg13683939 cg04925407 cg07796016 cg27083627 cg24184541 cg13736842 cg23828566 cg04636193 cg09175792 cg10975354 cg09937487 cg23056271 cg07480176 cg03237181
## cg01556010 cg15654485 cg14457850 cg09309537 cg16791832 cg07034012 cg01191347 cg01600123 cg00157199 cg07973125 cg21598489 cg17758583 cg22535849 cg02128221 cg11190082 cg21757617 cg10047502
## cg01080862 cg05786381 cg13571460 cg03395511 cg08292959 cg08857872 cg02788935 cg13213853 cg19864342 cg20142682 cg16059374 cg06482328 cg13604933 cg12040425 cg09231482 cg14615905 cg11164993
## cg21501207 cg16098618 cg24849373 cg11383134 cg06185870 cg17624691 cg21635596 cg09624684 cg14533874 cg17302834 cg10827006 cg11849573 cg17849117 cg11056057 cg16746961 cg09827761 cg08764590
## cg20678988 cg06875704 cg11896151 cg16431720 cg00443543 cg01080689 cg24202706 cg17049318 cg14661652 cg19692784 cg13353338 cg15128295 cg27376941 cg19619253 cg05565442 cg11075353 cg03997626
## cg25874079 cg19055639 cg01756638 cg00704384 cg13645300 cg24639370 cg03711046 cg04828704 cg17149359 cg07816287 cg01097733 cg21578644 cg18621672 cg26720147 cg21365235 cg11520843 cg09551793
## cg26170855 cg02887598 cg07066496 cg14556695 cg15852849 cg24873924 cg06634367 cg13717933 cg09906928 cg16858433 cg05930514 cg24536782 cg24925526 cg16345566 cg14007036 cg10547329 cg23881939
## cg17300120 cg16771702 cg00179446 cg07134666 cg19648783 cg14474963 cg15802548 cg16940942 cg18861767 cg01021334 cg12865398 cg23690264 cg23899408 cg21173378 cg09309899 cg12702014 cg04769268
## cg12536802 cg09282866 cg24769381 cg16066280 cg10831113 cg01921484 cg10266977 cg16840743 cg13066461 cg00415024 cg12814117 cg15070894 cg25963939 cg16338321 cg04069374 cg25655482 cg17376045
## cg25227163 cg05397816 cg14044167 cg05511752 cg05578102 cg14408831 cg02637608 cg12067202 cg12776173 cg02772880 cg19270931 cg02523105 cg00779763 cg14004892 cg22823767 cg20793193 cg06539076
## cg15914672 cg00823357 cg24035229 cg15421338 cg07466166 cg13725394 cg26647036 cg10486069 cg17544225 cg04712670 cg18029737 cg17005162 cg05127178 cg12419462 cg27047283 cg02761835 cg13821051
## cg26744454 cg19596870 cg20073472 cg17290099 cg07758529 cg26927606 cg14652218 cg21999614 cg15677681 cg11012412 cg02643260 cg05045738 cg06508795 cg04665311 cg03273069 cg05994819 cg13510054
## cg05049545 cg17661798 cg18873965 cg25896944 cg00247094 cg26128129 cg12603865 cg26985119 cg07553030 cg06546677 cg23066860 cg27616996 cg05204037 cg00424152 cg07649160 cg20565764 cg25237542
## cg20053110 cg14368286 cg11394125 cg00268443 cg06681098 cg06769752 cg02502145 cg24920613 cg06362895 cg11615510 cg15782903 cg17231740 cg03738707 cg18799866 cg21495366 cg19977566 cg19712277
## cg13606395 cg13920856 cg18843803 cg06158088 cg20282533 cg16653991 cg23969005 cg23990273 cg17135929 cg21046413 cg14383905 cg12306781 cg02732801 cg19662677 cg12058840 cg16617243 cg10581449
## cg12145026 cg13361506 cg17360125 cg05592647 cg19867709 cg18261043 cg09449747 cg25712921 cg13122347 cg13532816 cg05185684 cg23893060 cg22876425 cg16303048 cg06699489 cg15591384 cg24770638
## cg13089583 cg05001007 cg27198824 cg19555075 cg16348674 cg17937061 cg19618634 cg00575851 cg04610742 cg07965995 cg25870731 cg10590338 cg14582632 cg15858894 cg23873200 cg06323885 cg09663736
## cg01130884 cg08354527 cg10536534 cg06856169 cg05057827 cg12829224 cg10235569 cg19223824 cg07771796 cg00151234 cg02447542 cg08373250 cg02647401 cg15041487 cg06341336 cg17770035 cg03084184
## cg17329602 cg00191052 cg00409000 cg27219185 cg04667775 cg20116159 cg03685263 cg06520293 cg04094482 cg11107468 cg13800652 cg22604777 cg05788681 cg24605338 cg07950491 cg21829038 cg10773971
## cg04124201 cg26465155 cg20504202 cg12858518 cg00016522 cg09186478 cg20664795 cg24078577 cg26845351 cg18156601 cg02219431 cg26505619 cg21533482 cg09146088 cg09084933 cg11901680 cg13482134
## cg23698271 cg00356834 cg22337407 cg20920357 cg20161548 cg06917976 cg05588352 cg03024957 cg20299670 cg16617551 cg01549082 cg00695177 cg02645586 cg20951645 cg17558827 cg06855731 cg03938978
## cg24407607 cg18309183 cg06390387 cg04829448 cg21209948 cg14539516 cg23726559 cg24318558 cg10553219 cg18507125 cg16044575 cg05194426 cg07768177 cg18632412 cg23053339 cg09232555 cg17227030
## cg20291091 cg20517941 cg13301327 cg05320460 cg20549400 cg25375329 cg13052474 cg10993865 cg08104711 cg11764747 cg18203203 cg20514061 cg22334681 cg26948066 cg19586483 cg02614045 cg12560103
## cg09516963 cg17207724 cg25640065 cg17691521 cg06302025 cg21594961 cg08785133 cg14388993 cg05554396 cg07846874 cg07523188 cg03854098 cg00502469 cg23356769 cg12942133 cg05849149 cg02177141
## cg22332157 cg26474732 cg22275278 cg09015880 cg00373606 cg16145609 cg14479037 cg00378473 cg19996396 cg21233518 cg06406458 cg01565803 cg17187785 cg01097406 cg09868337 cg14834143 cg08587717
## cg17369694 cg27107292 cg12870217 cg21329160 cg03372334 cg17876294 cg12137402 cg18132745 cg06520095 cg16854281 cg21890239 cg16751734 cg06624970 cg16151959 cg05424879 cg06295352 cg14773367
## cg24263233 cg24923543 cg05741563 cg26237810 cg20859841 cg25153266 cg22561883 cg15315638 cg24925741 cg23084506 cg09627057 cg18079296 cg20336016 cg13589109 cg18172516 cg27465531 cg03209009
## cg11133939 cg01533115 cg01275521 cg08996597 cg06526366 cg25340688 cg17487799 cg21606887 cg15700429 cg04811556 cg15032304 cg22010900 cg04579183 cg23417743 cg15514311 cg05678960 cg06489993
## cg12517167 cg02995878 cg01620164 cg01886630 cg09121569 cg19528502 cg11016420 cg05601623 cg11127281 cg24971873 cg25549819 cg04177426 cg22280068 cg23633026 cg25277809 cg12421087 cg14513624
## cg16076396 cg24634455 cg15600226 cg13984511 cg07903626 cg01287393 cg00910720 cg14928964 cg05103573 cg07568443 cg18444757 cg13243544 cg16254485 cg00691878 cg22213420 cg16515462 cg04332058
## cg13081620 cg03359067 cg04070122 cg14625604 cg13646297 cg12171675 cg01293417 cg02225060 cg06427226 cg26381742 cg00042902 cg04608203 cg13452812 cg20681688 cg02291897 cg03855052 cg16371598
## cg25169289 cg14548901 cg12894338 cg24139739 cg04613354 cg26845082 cg03683899 cg26950400 cg00512739 cg03516131 cg12543338 cg27567711 cg09014801 cg03537243 cg17010160 cg26542892 cg10138910
## cg25893052 cg11222557 cg02871887 cg10146330 cg01601188 cg19949137 cg04417708 cg24597825 cg20436028 cg10978526 cg02891306 cg14091713 cg09501102 cg02265440 cg20096820 cg13307142 cg12171481
## cg21575308 cg19741073 cg01979298 cg17040469 cg27309655 cg23914255 cg17830140 cg03034673 cg17390076 cg05522180 cg09970125 cg02748858 cg04109990 cg19928247 cg06620254 cg21718113 cg21702557
## cg19182683 cg10510586 cg02764183 cg02459704 cg13368637 cg21988739 cg06407043 cg06202802 cg11109139 cg17225604 cg07506153 cg19039586 cg27519679 cg10130209 cg19572135 cg15785681 cg02288667
## cg09045339 cg08891829 cg05137263 cg16778274 cg15755924 cg26540943 cg07414487 cg00538654 cg19392200 cg00636942 cg08408091 cg01629746 cg05379196 cg25960393 cg24430871 cg05179499 cg01777017
## cg27296089 cg22252245 cg25852925 cg21543270 cg19586382 cg26584772 cg09101062 cg16744531 cg11186981 cg13117582 cg00202029 cg02937293 cg04737881 cg04787237 cg26857803 cg07731488 cg07135512
## cg17611936 cg27112983 cg03876548 cg09879385 cg12279734 cg13828067 cg13835168 cg13438027 cg08604228 cg17398485 cg23066280 cg13143872 cg00623388 cg00440468 cg08656747 cg03372815 cg20056133
## cg15274662 cg22575892 cg01353788 cg13291896 cg12217778 cg06880438 cg03883761 cg00853216 cg00927228 cg08133365 cg20070588 cg09850561 cg08060988 cg10950266 cg02553872 cg06077978 cg24795173
## cg13324406 cg17068766 cg22522913 cg21380015 cg05889642 cg10662395 cg09616536 cg03732411 cg04982938 cg14741147 cg03695421 cg17370616 cg01491962 cg00988678 cg10632894 cg07525100 cg11720861
## cg13295089 cg03389273 cg00622384 cg23493872 cg08586441 cg18353405 cg01270299 cg00763725 cg20394620 cg16501779 cg05260852 cg06228453 cg25569462 cg05135828 cg01479413 cg20094343 cg23408987
## cg07664579 cg03316763 cg17658113 cg06936709 cg05996419 cg13338084 cg14057434 cg08476485 cg12586718 cg13431666 cg09018810 cg19732480 cg12108278 cg00328965 cg11322703 cg10666341 cg11379315
## cg12377327 cg26155681 cg11304899 cg15233060 cg07038409 cg05147108 cg17602500 cg17935021 cg25709790 cg24112882 cg01579322 cg11717930 cg04727236 cg10240127 cg22971550 cg19548593 cg19938535
## cg21724239 cg12307373 cg12338576 cg23432430 cg03810282 cg01206544 cg16247826 cg24540763 cg02756511 cg11165479 cg25124605 cg18492813 cg04311686 cg20036791 cg03047376 cg05740522 cg10054049
## cg20450662 cg26333595 cg15465743 cg08291129 cg02439761 cg20746035 cg15661671 cg18030003 cg09182085 cg10674704 cg16746159 cg00604086 cg15906052 cg16652920 cg06112204 cg23431897 cg15883690
## cg11517058 cg09087020 cg22134140 cg15552475 cg12228670 cg15308688 cg25116412 cg02478793 cg13215060 cg20218571 cg19792266 cg21905818 cg24576051 cg13081526 cg19636627 cg21331369 cg24831179
## cg11869614 cg01780361 cg06330705 cg05848894 cg08095377 cg20958732 cg26105341 cg03188948 cg22206855 cg06783548 cg10309386 cg19021236 cg03028216 cg16302040 cg15399577 cg26641676 cg24522768
## cg15198148 cg06201514 cg19503462 cg03754882 cg05835545 cg16984885 cg12368522 cg01095775 cg11170744 cg17776579 cg18885073 cg13957186 cg00124902 cg02547323 cg10471638 cg23885472 cg07028768
## cg18636716 cg04012614 cg11863130 cg03798428 cg26853071 cg20291244 cg13851901 cg16431713 cg19194595 cg11049774 cg24969716 cg10495496 cg26842946 cg24100841 cg03282748 cg06277607 cg05086798
## cg08370748 cg02630070 cg10942914 cg15099537 cg08087512 cg26116556 cg06568768 cg03436348 cg11787167 cg25184502 cg03192186 cg07104639 cg17296678 cg08537289 cg00513811 cg14240646 cg06960717
## cg02061213 cg18859776 cg15292356 cg01461762 cg09526164 cg07112541 cg01398912 cg08965565 cg01483656 cg09867302 cg21940586 cg19525496 cg15279541 cg00086247 cg04394104 cg03827304 cg04334684
## cg00145055 cg03964373 cg04175473 cg10062460 cg12501871 cg16726201 cg08684066 cg04210573 cg14260918 cg14470409 cg00993140 cg00079479 cg06631775 cg19761957 cg05138546 cg08242313 cg26157279
## cg04344997 cg14794494 cg01234546 cg26359388 cg01282008 cg03494844 cg25968748 cg13663706 cg00291896 cg25987936 cg06316104 cg05779406 cg26784310 cg16998810 cg09584650 cg04589021 cg05707815
## cg18105134 cg08269402 cg01023242 cg17330935 cg07180121 cg23009831 cg05534247 cg07223663 cg26889118 cg18537979 cg23663942 cg04029664 cg25881119 cg24440720 cg13680016 cg07792871 cg22653957
## cg24596064 cg27272246 cg03671371 cg00023507 cg23221052 cg14137819 cg20340631 cg17592148 cg21934265 cg12080266 cg10956264 cg08860608 cg04043455 cg06675660 cg06129455 cg00727483 cg10738049
## cg14824933 cg00501169 cg14883135 cg17330459 cg03292225 cg16762854 cg20081453 cg20673834 cg21456313 cg23657215 cg15059639 cg11141652 cg14106263 cg17316649 cg07738664 cg00904184 cg11821302
## cg17455208 cg08208480 cg01778345 cg09307518 cg09428674 cg00632374 cg23699809 cg06559318 cg02218418 cg16158407 cg10921219 cg12689021 cg27076487 cg06908232 cg07671395 cg20945531 cg26264314
## cg06791102 cg05126514 cg20057198 cg19561561 cg14232291 cg19605909 cg14748151 cg03894002 cg09418035 cg04447612 cg08112003 cg01948589 cg15511490 cg22866218 cg18049750 cg10051493 cg04856605
## cg17314580 cg21986118 cg15963542 cg14422932 cg18016370 cg17271308 cg20208879 cg22786333 cg16749614 cg25009553 cg22741595 cg12925689 cg11734017 cg22325292 cg19097407 cg06926874 cg26506212
## cg19455396 cg22366618 cg20107632 cg12581298 cg25078813 cg19637330 cg01062020 cg15146462 cg20971536 cg26010110 cg20356878 cg16204757 cg00240113 cg07062522 cg01053087 cg22827938 cg16892087
## cg22436195 cg04144603 cg05551578 cg13069450 cg00409995 cg26052728 cg15373744 cg17858192 cg04012354 cg11733135 cg05130312 cg15574437 cg10331657 cg15084585 cg17571554 cg04234424 cg06767339
## cg25943481 cg07679432 cg03769817 cg19885057 cg17044529 cg13011003 cg13128531 cg17821453 cg20057595 cg02108367 cg19364311 cg14060113 cg03924089 cg20442640 cg11070274 cg07999547 cg04544498
## cg24422984 cg14210943 cg14904299 cg05850457 cg07971231 cg22190077 cg00191758 cg02129532 cg00122614 cg03393996 cg05406088 cg04664583 cg04642489 cg27505047 cg11290949 cg08952867 cg16787259
## cg26021304 cg01584086 cg19063343 cg27515272 cg03157806 cg02456261 cg24826745 cg26757229 cg24207904 cg07144177 cg05053752 cg10863737 cg21035183 cg11816734 cg25548414 cg02318959 cg07929412
## cg19632603 cg13024202 cg24478129 cg20459037 cg24529533 cg09216282 cg19156046 cg27424370 cg01591343 cg13771629 cg10101479 cg04532989 cg22713892 cg01996567 cg00072288 cg06780766 cg04787784
## cg03982462 cg01086868 cg00769799 cg25958450 cg08880703 cg25939766 cg14453693 cg05064044 cg05370638 cg01445307 cg08145292 cg02805922 cg19860691 cg07863318 cg25228188 cg17281658 cg07363416
## cg06715136 cg18426655 cg15880010 cg03441493 cg27596172 cg20803293 cg15967253 cg03615426 cg00883449 cg17345036 cg09610569 cg14649234 cg16512708 cg16586288 cg04132418 cg09233619 cg18137779
## cg14001750 cg18913020 cg17101431 cg15501526 cg09523186 cg13517032 cg14417873 cg11657520 cg06139856 cg06776257 cg07931809 cg14514383 cg23333490 cg09092713 cg01436125 cg06634228 cg09407917
## cg07553115 cg18705301 cg08239458 cg08560117 cg12532878 cg18742441 cg16108684 cg04248279 cg27653615 cg16538737 cg12280242 cg22817042 cg08991015 cg08395784 cg12952561 cg03398989 cg10149949
## cg06833284 cg02573613 cg07294114 cg26690318 cg07920034 cg01461235 cg26348696 cg03161453 cg08762424 cg22674664 cg14868374 cg27149093 cg12449246 cg04528545 cg09504384 cg01647917 cg14938272
## cg12623328 cg05904344 cg16814362 cg15170964 cg15144825 cg01306265 cg18769303 cg13641645 cg05294909 cg24419602 cg18677034 cg06438901 cg19648023 cg08965337 cg07854829 cg15797015 cg08434396
## cg01680303 cg27623975 cg00098609 cg18874882 cg16571124 cg18367269 cg21299256 cg07158503 cg01324343 cg21903569 cg26246138 cg09406238 cg12325071 cg23212799 cg25094735 cg05230013 cg14257071
## cg05837905 cg12928933 cg06371647 cg16885113 cg17671604 cg16293892 cg17370981 cg22527791 cg17527589 cg06536614 cg27466466 cg16101574 cg03983969 cg15530374 cg22404117 cg16344140 cg09734113
## cg06504563 cg14175932 cg17976473 cg26219488 cg25125652 cg05538645 cg10744664 cg05829338 cg18818432 cg13549303 cg03979311 cg21537736 cg13692482 cg12196389 cg15480941 cg24401199 cg25402895
## cg11049634 cg03444934 cg08579577 cg13431688 cg02973971 cg15730644 cg00775115 cg11329209 cg25052176 cg18819889 cg08087911 cg00929286 cg12074688 cg07774765 cg03435878 cg08437835 cg08090704
## cg24157854 cg04033559 cg12534199 cg23157067 cg05570109 cg02981548 cg25714218 cg16145139 cg08871399 cg15953767 cg15229836 cg21397157 cg24104387 cg08894652 cg01688609 cg09197443 cg03447554
## cg20389709 cg07356745 cg14295009 cg08963265 cg26614134 cg07987169 cg03403606 cg01521131 cg12291192 cg25751474 cg04836978 cg06172626 cg01512466 cg05040568 cg19428430 cg11062691 cg18566479
## cg25208881 cg25220992 cg09022442 cg08861434 cg16637019 cg16488059 cg16890879 cg10528537 cg10978884 cg01108554 cg03460558 cg14502545 cg19162841 cg11789449 cg04718469 cg22802014 cg06590444
## cg10861555 cg11663393 cg15393936 cg12562822 cg11631601 cg21040569 cg10361788 cg16033420 cg21113886 cg23945350 cg22651103 cg09300348 cg02264182 cg23124867 cg00689685 cg24433124 cg15458936
## cg13183496 cg20790618 cg11271681 cg24721916 cg21367586 cg15815114 cg07052950 cg13033283 cg09045429 cg07495405 cg09598844 cg23390865 cg15145296 cg06773563 cg13571540 cg17002719 cg08791697
## cg04183498 cg17429539 cg15706568 cg23216745 cg08707656 cg18665209 cg23610414 cg02872767 cg08506353 cg14541448 cg11466708 cg12454975 cg23676314 cg19166406 cg06614118 cg05401945 cg17726767
## cg21911363 cg17352172 cg13941987 cg04086176 cg01561758 cg19549023 cg24246628 cg06479769 cg00264378 cg26167301 cg06293983 cg08554146 cg11358878 cg10815671 cg06380355 cg11405538 cg09756125
## cg12252547 cg01760119 cg00322003 cg20611115 cg00274965 cg21393171 cg14170504 cg07504457 cg07195891 cg04958669 cg12819537 cg05004142 cg16227684 cg26359238 cg01098871 cg07189587 cg27649396
## cg13208102 cg05799088 cg13967339 cg21533743 cg23947654 cg02696431 cg03040423 cg18526121 cg02464808 cg12230162 cg01079126 cg06392956 cg03276257 cg05123276 cg13860040 cg02624484 cg13806096
## cg16712416 cg01821994 cg12994227 cg04217234 cg18396806 cg05896524 cg04963199 cg14107346 cg25496792 cg17925226 cg05008355 cg10655551 cg11247378 cg26011713 cg08300622 cg21158163 cg07890587
## cg03115532 cg16829530 cg01757283 cg24546622 cg26450896 cg08774483 cg07152869 cg08118341 cg26161745 cg10016783 cg01045938 cg00618396 cg23534593 cg05155812 cg16680922 cg15765638 cg23750556
## cg26745032 cg03787837 cg05876883 cg13273653 cg12039967 cg04011368 cg12232167 cg01288367 cg22466012 cg23517115 cg00357087 cg07754385 cg06825163 cg15411272 cg25258879 cg00102779 cg15308487
## cg13121076 cg20398163 cg08395108 cg01730928 cg11253356 cg13405878 cg10841563 cg04459447 cg03672288 cg02589074 cg18816397 cg14016568 cg23934598 cg00690049 cg06493824 cg01121830 cg03531388
## cg01997599 cg11906884 cg14687298 cg20033444 cg11867195 cg00638631 cg08433110 cg03317455 cg10064060 cg10796603 cg07342016 cg18834878 cg05870108 cg18780401 cg04229575 cg14627380 cg02167895
## cg20124410 cg07889003 cg02720188 cg12400864 cg10864200 cg10294367 cg00475988 cg15260248 cg00216997 cg22713460 cg05519582 cg14378311 cg00722631 cg16692973 cg00154902 cg20201388 cg05280794
## cg04819054 cg08063850 cg10310824 cg14527649 cg24100293 cg22014878 cg06042886 cg13788004 cg06249604 cg15098922 cg16074990 cg19902553 cg12030973 cg23830335 cg04612030 cg02895192 cg01560464
## cg02129889 cg02611122 cg19242610 cg04746792 cg01573782 cg10146442 cg19561503 cg22341646 cg00993903 cg20855303 cg10698654 cg15985500 cg01203766 cg06919398 cg17676246 cg02157475 cg19523014
## cg21489390 cg08800033 cg05180335 cg15014034 cg18434912 cg12842316 cg26901661 cg05471616 cg00123214 cg13241003 cg01000789 cg14841098 cg05497266 cg09102275 cg17118775 cg01132407 cg19137564
## cg27502296 cg10039445 cg22443212 cg08299859 cg15338502 cg17977362 cg05355592 cg24906015 cg03023189 cg13498757 cg14188106 cg03392100 cg08167410 cg20517444 cg00004073 cg15185382 cg07634717
## cg06870118 cg20769264 cg17203703 cg21882403 cg08320316 cg15570656 cg11857742 cg17731486 cg06818142 cg27187671 cg27452255 cg01244877 cg09354738 cg13918312 cg13871921 cg08065501 cg11409998
## cg17920241 cg24432675 cg01077499 cg13726724 cg24389217 cg09112156 cg10507819 cg07910525 cg03129555 cg11936536 cg13289413 cg01079515 cg13375589 cg05279330 cg17337021 cg21392220 cg23423607
## cg08627233 cg26827441 cg07157030 cg09202373 cg21610927 cg14683071 cg07889355 cg04152793 cg18850127 cg16914890 cg11403739 cg13943068 cg14271106 cg18522231 cg06697310 cg02386310 cg21491240
## cg16091746 cg11371160 cg02631626 cg07883385 cg19163396 cg04593869 cg25685741 cg02179473 cg10896204 cg14959801 cg13666323 cg00114913 cg00209850 cg02188665 cg14089267 cg17129965 cg13014982
## cg07147204 cg03576039 cg15841434 cg12906381 cg06231502 cg03908647 cg00258480 cg03751162 cg22012543 cg09454992 cg02714492 cg16641190 cg01615050 cg10195365 cg03185552 cg26815962 cg13192155
## cg04547000 cg07011790 cg05491276 cg12063064 cg20088245 cg27286863 cg03126799 cg07869343 cg14029629 cg01078706 cg04665139 cg13459056 cg16361249 cg01741056 cg07335343 cg22951728 cg03558222
## cg25545878 cg15027606 cg19344820 cg08846002 cg11479389 cg07314988 cg20507276 cg07819186 cg20271609 cg01947226 cg04506342 cg13250819 cg26688923 cg11109182 cg14623940 cg06124711 cg12220696
## cg02233190 cg09727210 cg19445191 cg11890740 cg10759817 cg04832504 cg12527112 cg16718054 cg24323552 cg14421879 cg13208429 cg00343163 cg05865327 cg22660483 cg14961598 cg26995506 cg07848310
## cg04821091 cg08680085 cg22805356 cg03830006 cg04481923 cg24978383 cg26786476 cg00977253 cg18918831 cg17345373 cg00142683 cg08108858 cg00276426 cg05375065 cg05230942 cg18333092 cg19455028
## cg13823439 cg11976736 cg23897894 cg03377939 cg13311758 cg12074916 cg26966808 cg09985072 cg23112505 cg00321709 cg21243064 cg03056849 cg26919182 cg10530344 cg24680791 cg13781574 cg15831967
## cg01329836 cg06506080 cg15991630 cg07747666 cg02161125 cg25547772 cg27577781 cg26511598 cg02092632 cg11024449 cg17601621 cg10026671 cg12381370 cg10479459 cg00545801 cg14343687 cg03958058
## cg04970287 cg18005219 cg02649608 cg25873514 cg04512413 cg16061810 cg25873934 cg20685672 cg19022384 cg04980389 cg20344426 cg03347749 cg08638354 cg01043588 cg00045034 cg00901687 cg10044179
## cg05013890 cg14098973 cg11886220 cg04568643 cg22662205 cg14522327 cg13723766 cg02252289 cg20731167 cg19579913 cg07044115 cg05812143 cg07904290 cg11016745 cg21693321 cg12268565 cg24964466
## cg07478795 cg20486825 cg03660162 cg09799350 cg05134041 cg04675184 cg15001930 cg13655660 cg27399895 cg08318837 cg19130458 cg00167913 cg09855112 cg13966843 cg04814085 cg17042243 cg12744031
## cg22620746 cg12186981 cg01683788 cg07469075 cg08816901 cg08955276 cg01191806 cg14871604 cg05377703 cg23595710 cg04238548 cg11727174 cg16394734 cg12105899 cg18834050 cg14317384 cg22625945
## cg02634916 cg01410230 barcodes RID.a prop.B prop.NK prop.CD4T prop.CD8T prop.Mono prop.Neutro prop.Eosino DX age.now PTGENDER ABETA TAU PTAU PC1 PC2 PC3 ageGroup ageGroupsq DX_num uniqueID
## Horvath
## [ reached 'max' / getOption("max.print") -- omitted 6 rows ]
## [1] 554 5023
if(METHOD_FEATURE_FLAG == 5){
pheno_data_m5 <- df_picked_m5[,phenotic_features_m5]
print(head(pheno_data_m5[,1:5],n=3))
design_m5 <- model.matrix(~0 + .,data=pheno_data_m5)
colnames(design_m5)[colnames(design_m5) == "DXCN"] <- "CN"
colnames(design_m5)[colnames(design_m5) == "DXMCI"] <- "MCI"
print(head(design_m5))
beta_values_m5 <- t(as.matrix(df_fs_method5[,featureName_CpGs]))
}
## DX age.now PTGENDER PC1 PC2
## 200223270003_R02C01 MCI 82.4 Male -0.214185447 0.01470293
## 200223270003_R03C01 CN 78.6 Female -0.172761185 0.05745834
## 200223270003_R06C01 CN 80.4 Female -0.003667305 0.08372861
## CN MCI age.now PTGENDERMale PC1 PC2 PC3
## 200223270003_R02C01 0 1 82.40000 1 -0.214185447 1.470293e-02 -0.014043316
## 200223270003_R03C01 1 0 78.60000 0 -0.172761185 5.745834e-02 0.005055871
## 200223270003_R06C01 1 0 80.40000 0 -0.003667305 8.372861e-02 0.029143653
## 200223270006_R01C01 0 1 62.90000 0 0.026814649 1.650735e-05 0.052947950
## 200223270006_R04C01 1 0 80.67796 0 -0.037862929 1.571950e-02 -0.008685676
## 200223270006_R07C01 0 1 80.60000 0 0.122391548 3.458436e-02 0.051136541
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 5, we focus on two groups (CN and MCI), one contrast of interest.
if(METHOD_FEATURE_FLAG == 5){
fit_m5 <- lmFit(beta_values_m5, design_m5)
head(fit_m5$coefficients)
contrast.matrix <- makeContrasts(MCI - CN, levels = design_m5)
fit2_m5 <- contrasts.fit(fit_m5, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m5 <- eBayes(fit2_m5)
}
if(METHOD_FEATURE_FLAG == 5){
decideTests(fit2_m5)
}
## TestResults matrix
## Contrasts
## MCI - CN
## cg08223187 0
## cg15794987 0
## cg04821830 0
## cg24629711 0
## cg17380855 0
## 4995 more rows ...
if(METHOD_FEATURE_FLAG == 5){
dmp_results_m5_try1 <- decideTests(
fit2_m5, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m5_try1)
}
## dmp_results_m5_try1
## 0
## 5000
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 5){
# Identify DMPs, we will use this one:
dmp_results_m5 <- decideTests(
fit2_m5, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m5)
}
## dmp_results_m5
## -1 0 1
## 208 4607 185
if(METHOD_FEATURE_FLAG == 5){
significant_dmp_filter <- dmp_results_m5 != 0
significant_cpgs_m5_DMP <- rownames(dmp_results_m5)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m5_afterDMP<-c(phenotic_features_m5,significant_cpgs_m5_DMP)
df_picked_m5<-df_picked_m5[,pickedFeatureName_m5_afterDMP]
dim(df_picked_m5)
}
## [1] 554 399
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 5){
full_results_m5 <- topTable(fit2_m5, number=Inf)
full_results_m5 <- tibble::rownames_to_column(full_results_m5,"ID")
head(full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
sorted_full_results_m5 <- full_results_m5[
order(full_results_m5$logFC, decreasing = TRUE), ]
head(sorted_full_results_m5)
}
if(METHOD_FEATURE_FLAG == 5){
library(ggplot2)
ggplot(full_results_m5,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 5){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m5 <- full_results_m5 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m5, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 5){
ggplot(full_results_m5,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 5){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m5 <- full_results_m5 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m5,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider increasing max.overlaps
if(METHOD_FEATURE_FLAG == 5){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m5) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m5)
processed_data_m5 <- bake(rec_prep, new_data = df_picked_m5)
processed_data_m5_df <- as.data.frame(processed_data_m5)
rownames(processed_data_m5_df) <- rownames(df_picked_m5)
print(dim(processed_data_m5))
}
## [1] 554 325
if(METHOD_FEATURE_FLAG == 5){
AfterProcess_FeatureName_m5<-colnames(processed_data_m5)
print(length(AfterProcess_FeatureName_m5))
head(AfterProcess_FeatureName_m5)
tail(AfterProcess_FeatureName_m5)
}
## [1] 325
## [1] "cg27577781" "cg20685672" "cg07478795" "cg03660162" "cg17042243" "DX"
if(METHOD_FEATURE_FLAG == 5){
levels(df_picked_m5$DX)
}
## [1] "CN" "MCI"
if(METHOD_FEATURE_FLAG == 5){
lastColumn_NUM_m5<-dim(processed_data_m5)[2]
last5Column_NUM_m5<-lastColumn_NUM_m5-5
head(processed_data_m5[,last5Column_NUM_m5 :lastColumn_NUM_m5])
}
if(METHOD_FEATURE_FLAG == 5){
print(levels(processed_data_m5$DX))
print(dim(processed_data_m5))
}
## [1] "CN" "MCI"
## [1] 554 325
In this method, only CN and AD class will be considered.
if(METHOD_FEATURE_FLAG == 6){
df_fs_method6<-clean_merged_df
}
if(METHOD_FEATURE_FLAG == 6){
phenotic_features_m6<-c(
"DX","age.now","PTGENDER","PC1","PC2","PC3")
pickedFeatureName_m6<-c(phenotic_features_m6,featureName_CpGs)
df_picked_m6<-df_fs_method6[,pickedFeatureName_m6]
df_picked_m6$DX<-as.factor(df_picked_m6$DX)
df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
head(df_picked_m6[,1:5],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
dim(df_picked_m6)
}
if(METHOD_FEATURE_FLAG == 6){
df_picked_m6<-df_picked_m6 %>% filter(DX != "CN") %>% droplevels()
df_picked_m6$DX<-as.factor(df_picked_m6$DX)
df_picked_m6$PTGENDER<-as.factor(df_picked_m6$PTGENDER)
head(df_picked_m6[1:10],n=3)
}
if(METHOD_FEATURE_FLAG == 6){
print(dim(df_picked_m6))
print(table(df_picked_m6$DX))
}
if(METHOD_FEATURE_FLAG == 6){
df_fs_method6 <- df_fs_method6 %>% filter(DX != "CN") %>% droplevels()
df_fs_method6$DX<-as.factor(df_fs_method6$DX)
print(head(df_fs_method6))
print(dim(df_fs_method6))
}
if(METHOD_FEATURE_FLAG == 6){
pheno_data_m6 <- df_picked_m6[,phenotic_features_m6]
print(head(pheno_data_m6[,1:5],n=3))
design_m6 <- model.matrix(~0 + .,data=pheno_data_m6)
colnames(design_m6)[colnames(design_m6) == "DXDementia"] <- "Dementia"
colnames(design_m6)[colnames(design_m6) == "DXMCI"] <- "MCI"
print(head(design_m6))
beta_values_m6 <- t(as.matrix(df_fs_method6[,featureName_CpGs]))
}
In order to perform the differential analysis - Differentially Methylated Position (DMP), we have to define the contrast that we are interested in. In this method 6, we focus on two groups (MCI and Dementia), one contrast of interest.
if(METHOD_FEATURE_FLAG == 6){
fit_m6 <- lmFit(beta_values_m6, design_m6)
head(fit_m6$coefficients)
contrast.matrix <- makeContrasts(MCI - Dementia, levels = design_m6)
fit2_m6 <- contrasts.fit(fit_m6, contrast.matrix)
# Apply the empirical Bayes’ step to get our differential expression statistics and p-values.
fit2_m6 <- eBayes(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
decideTests(fit2_m6)
}
if(METHOD_FEATURE_FLAG == 6){
dmp_results_m6_try1 <- decideTests(
fit2_m6, lfc = 0.01, adjust.method = "fdr", p.value = 0.1)
table(dmp_results_m6_try1)
}
The constraints is too tight, let’s smooth the constraint.
if(METHOD_FEATURE_FLAG == 6){
# Identify DMPs, we will use this one:
dmp_results_m6 <- decideTests(
fit2_m6, lfc = 0.01, adjust.method = "none", p.value = 0.1)
table(dmp_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
significant_dmp_filter <- dmp_results_m6 != 0
significant_cpgs_m6_DMP <- rownames(dmp_results_m6)[
apply(significant_dmp_filter, 1, any)]
pickedFeatureName_m6_afterDMP<-c(phenotic_features_m6,significant_cpgs_m6_DMP)
df_picked_m6<-df_picked_m6[,pickedFeatureName_m6_afterDMP]
dim(df_picked_m6)
}
The “Volcano Plot”is one way to visualize the results of a DE analysis.
X - axis shows the log-fold change in methylation levels between two classes. The Log Fold Change (LogFC) can be calculated as \(\log_2 \left( \frac{\text{mean}(\text{Group1})}{\text{mean}(\text{Group2})} \right)\).
Interpretation of logFC:
Positive LogFC: Indicates that the measurement is higher in the first group compared to the second group, here means hypermethylation (increase in methylation).
Negative LogFC: Indicates that the measurement is lower in the first group compared to the second group, here means hypomethylation (decrease in methylation) in the experimental condition compared to the reference.
LogFC of 0: Indicates no difference in the measurement between the two groups.
Y - axis shows some measure of statistical significance, like the log-odds, or “B” statistic. In the following, we will use B statistics. The log-odds can be calculated by \(B = \log_e(\text{posterior odds})\).
Interpretation of B-value:
Higher B-value: Indicates stronger evidence for differential methylation.
Lower (or negative) B-value: Indicates weaker evidence for differential methylation.
B-value close to zero: Indicates uncertainty or lack of strong evidence for differential methylation.
A characteristic “volcano” shape should be seen. Let’s look at the results:
if(METHOD_FEATURE_FLAG == 6){
full_results_m6 <- topTable(fit2_m6, number=Inf)
full_results_m6 <- tibble::rownames_to_column(full_results_m6,"ID")
head(full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
sorted_full_results_m6 <- full_results_m6[
order(full_results_m6$logFC, decreasing = TRUE), ]
head(sorted_full_results_m6)
}
if(METHOD_FEATURE_FLAG == 6){
library(ggplot2)
ggplot(full_results_m6,aes(x = logFC, y=B)) + geom_point()
}
Now, let’s visualize the plot with the cutoff
if(METHOD_FEATURE_FLAG == 6){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m6 <- full_results_m6 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m6, aes(x = logFC,
y = B, col = Significant, label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
Now, let’s change the y-axis to P value
if(METHOD_FEATURE_FLAG == 6){
ggplot(full_results_m6,aes(x = logFC, y=-log10(P.Value))) + geom_point()
}
if(METHOD_FEATURE_FLAG == 6){
library(dplyr)
library(ggrepel)
p_cutoff <- 0.1
fc_cutoff <- 0.01
topN <- 20
full_results_m6 <- full_results_m6 %>%
mutate(Significant = P.Value < p_cutoff & abs(logFC) > fc_cutoff) %>%
mutate(Rank = rank(-abs(logFC)),
Label = ifelse(Rank <= topN, as.character(ID), ""))
ggplot(full_results_m6,
aes(x = logFC, y = -log10(P.Value),
col = Significant,
label = Label)) +
geom_point() +
geom_text_repel(col = "black")
}
if(METHOD_FEATURE_FLAG == 6){
library(recipes)
rec <- recipe(DX ~ ., data = df_picked_m6) %>%
step_zv(all_predictors()) %>%
# step_range(all_numeric(), -all_outcomes()) %>%
step_dummy(all_nominal(), -all_outcomes())%>%
step_corr(all_predictors(), threshold = 0.7)
rec_prep <- prep(rec, df_picked_m6)
processed_data_m6 <- bake(rec_prep, new_data = df_picked_m6)
processed_data_m6_df <- as.data.frame(processed_data_m6)
rownames(processed_data_m6_df) <- rownames(df_picked_m6)
print(dim(processed_data_m6))
}
if(METHOD_FEATURE_FLAG == 6){
AfterProcess_FeatureName_m6<-colnames(processed_data_m6)
print(length(AfterProcess_FeatureName_m6))
head(AfterProcess_FeatureName_m6)
tail(AfterProcess_FeatureName_m6)
}
if(METHOD_FEATURE_FLAG == 6){
levels(df_picked_m6$DX)
}
if(METHOD_FEATURE_FLAG == 6){
lastColumn_NUM_m6<-dim(processed_data_m6)[2]
last5Column_NUM_m6<-lastColumn_NUM_m6-5
head(processed_data_m6[,last5Column_NUM_m6 :lastColumn_NUM_m6])
}
if(METHOD_FEATURE_FLAG == 6){
print(levels(processed_data_m6$DX))
print(dim(processed_data_m6))
}
name for “processed_data” could be :
“processed_data_m1”, which uses method one to process the data
“processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.
“processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names.
“processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
“processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
“processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
name for “AfterProcess_FeatureName” (include “DX” label) could be :
if(METHOD_FEATURE_FLAG==1){
processed_dataFrame<-processed_data_m1_df
processed_data<-processed_data_m1
AfterProcess_FeatureName<-AfterProcess_FeatureName_m1
}
if(METHOD_FEATURE_FLAG==2){
processed_dataFrame<-processed_data_m2_df
processed_data<-processed_data_m2
AfterProcess_FeatureName<-AfterProcess_FeatureName_m2
}
if(METHOD_FEATURE_FLAG==3){
processed_dataFrame<-processed_data_m3_df
processed_data<-processed_data_m3
AfterProcess_FeatureName<-AfterProcess_FeatureName_m3
}
if(METHOD_FEATURE_FLAG==4){
processed_dataFrame<-processed_data_m4_df
processed_data<-processed_data_m4
AfterProcess_FeatureName<-AfterProcess_FeatureName_m4
}
if(METHOD_FEATURE_FLAG==5){
processed_dataFrame<-processed_data_m5_df
processed_data<-processed_data_m5
AfterProcess_FeatureName<-AfterProcess_FeatureName_m5
}
if(METHOD_FEATURE_FLAG==6){
processed_dataFrame<-processed_data_m6_df
processed_data<-processed_data_m6
AfterProcess_FeatureName<-AfterProcess_FeatureName_m6
}
print(head(processed_dataFrame))
## age.now PC1 PC2 PC3 cg18993517 cg13573375 cg02621446 cg24470466 cg08896901 cg12146221 cg05234269 cg14293999 cg19377607 cg14307563 cg17018422 cg08914944
## 200223270003_R02C01 82.4 -0.214185447 0.01470293 -0.014043316 0.2091538 0.8670419 0.8731313 0.7725300 0.3581911 0.2049284 0.93848584 0.2836710 0.05377464 0.1855966 0.5262747 0.63423942
## 200223270003_R03C01 78.6 -0.172761185 0.05745834 0.005055871 0.2665896 0.1733934 0.8095534 0.9041432 0.2467071 0.1814927 0.57461229 0.9172023 0.90570746 0.8916957 0.9029604 0.04392811
## 200223270003_R06C01 80.4 -0.003667305 0.08372861 0.029143653 0.2574003 0.8888246 0.7511582 0.1206738 0.9225209 0.8619250 0.02467208 0.9168166 0.06636174 0.8750052 0.5100750 0.06893322
## cg21209485 cg11331837 cg11187460 cg09451339 cg11866463 cg06961873 cg14564293 cg02464073 cg07437923 cg12012426 cg23159970 cg10788927 cg05392160 cg04540199 cg10890644 cg18285382
## 200223270003_R02C01 0.8865053 0.03692842 0.03672179 0.2243746 0.9081537 0.5335591 0.52089591 0.4842537 0.03675396 0.9165048 0.61817246 0.8973154 0.9328933 0.8165865 0.1402372 0.3202927
## 200223270003_R03C01 0.8714878 0.57150125 0.92516409 0.2340702 0.2542510 0.5472606 0.04000662 0.4998933 0.64563331 0.9434768 0.57492600 0.2021398 0.2576881 0.7964195 0.1348023 0.2930577
## 200223270003_R06C01 0.2292550 0.03182862 0.03109553 0.8921284 0.9041234 0.9415177 0.04959460 0.9077933 0.04533840 0.9220044 0.03288909 0.2053075 0.8920726 0.4698047 0.1407028 0.8923595
## cg03796003 cg24851651 cg22071943 cg03549208 cg17653352 cg16405337 cg04831745 cg07640670 cg25879395 cg09708852 cg05593887 cg26705599 cg07227024 cg21783012 cg03221390 cg04728936
## 200223270003_R02C01 0.89227099 0.03674702 0.8705217 0.9014487 0.9269778 0.6177291 0.61984995 0.58296513 0.88130864 0.2843446 0.5939220 0.8585917 0.04553128 0.9142369 0.5859063 0.2172057
## 200223270003_R03C01 0.86011668 0.05358297 0.2442648 0.8381784 0.9086951 0.6131717 0.71214149 0.55225610 0.02603438 0.2897826 0.5766550 0.8613854 0.05004286 0.6694884 0.9180706 0.1925451
## 200223270003_R06C01 0.08518098 0.05968923 0.2644581 0.9097817 0.9341775 0.6098664 0.06871768 0.04058533 0.91060615 0.8896436 0.9148338 0.4332832 0.06152206 0.9070112 0.6399867 0.2379376
## cg10985055 cg09289202 cg15912814 cg03327352 cg20300784 cg06286533 cg05130642 cg15491125 cg04462915 cg08880261 cg25649515 cg08779649 cg22169467 cg00553601 cg03749159 cg26069044
## 200223270003_R02C01 0.8518169 0.4361103 0.8342997 0.8851712 0.86585964 0.2734841 0.8575504 0.9066635 0.03224861 0.40655904 0.9279829 0.44449401 0.3095010 0.05601299 0.9355921 0.9240187
## 200223270003_R03C01 0.8631895 0.4397504 0.8673032 0.8786878 0.86609999 0.9354924 0.8644077 0.3850991 0.50740695 0.85616966 0.9235753 0.45076825 0.2978585 0.58957701 0.9153921 0.9407223
## 200223270003_R06C01 0.5456633 0.4193555 0.8455862 0.3042310 0.03091187 0.8696546 0.3661324 0.9091504 0.02700644 0.03280808 0.5895839 0.04810217 0.8955853 0.62426500 0.9255807 0.9332131
## cg03088219 cg17738613 cg02772171 cg17186592 cg16956806 cg06012903 cg14192979 cg17906851 cg01933473 cg16089727 cg18857647 cg00767423 cg16771215 cg03737947 cg26679884 cg22305850
## 200223270003_R02C01 0.844002862 0.6879612 0.9182018 0.9230463 0.5429432 0.7964595 0.06336040 0.9488392 0.2589014 0.86748697 0.8582332 0.9298253 0.88389723 0.91824910 0.6793815 0.03361934
## 200223270003_R03C01 0.007435243 0.6582258 0.5660559 0.8593448 0.9269300 0.1933431 0.06019651 0.9529718 0.6726133 0.54996692 0.8394132 0.2651854 0.07196933 0.92067153 0.1848705 0.57522232
## 200223270003_R06C01 0.120155222 0.1022257 0.8995479 0.8467599 0.5973919 0.1960773 0.52114282 0.6462151 0.2642560 0.05876736 0.2647491 0.8667808 0.09949974 0.03638091 0.1701734 0.58548744
## cg26846609 cg21139150 cg25059696 cg21388339 cg13815695 cg26642936 cg04888234 cg08096656 cg03600007 cg11438323 cg27086157 cg15535896 cg18698799 cg12501287 cg14228103 cg08514194
## 200223270003_R02C01 0.48860949 0.01853264 0.9017504 0.2756268 0.9267057 0.7619266 0.8379655 0.9362594 0.5658487 0.4863471 0.9224112 0.3382952 0.70099633 0.4654925 0.9141064 0.9128478
## 200223270003_R03C01 0.04878986 0.43223243 0.3047156 0.2102269 0.6859729 0.7023413 0.4376314 0.9314878 0.6018832 0.8984559 0.9219304 0.9253926 0.05812989 0.5126917 0.8591302 0.2613138
## 200223270003_R06C01 0.48026945 0.43772680 0.3051179 0.7649181 0.6509046 0.7099380 0.8039047 0.4943033 0.8611166 0.8722772 0.3224986 0.3320191 0.06957486 0.9189144 0.1834348 0.9202187
## cg10738648 cg08138245 cg23836570 cg09785377 cg16536985 cg12784167 cg15633912 cg02495179 cg04875706 cg19471911 cg20823859 cg02078724 cg03635532 cg04242342 cg09247979 cg02246922
## 200223270003_R02C01 0.44931577 0.8115760 0.58688450 0.9162088 0.5789643 0.81503498 0.1605530 0.6813307 0.5790542 0.6334393 0.9030711 0.3096774 0.8416733 0.8206769 0.5070956 0.7301201
## 200223270003_R03C01 0.49894016 0.1109940 0.54259383 0.9226292 0.5418687 0.02811410 0.9333421 0.7373055 0.9255066 0.8437175 0.6062985 0.2896133 0.8262538 0.8167892 0.5706177 0.9447019
## 200223270003_R06C01 0.05552024 0.7444698 0.03267304 0.6405193 0.8392044 0.03073269 0.8737362 0.5588114 0.9155843 0.6127952 0.8917348 0.2805612 0.8450480 0.8040357 0.5090215 0.7202230
## cg25436480 cg06483046 cg12534577 cg02550738 cg01008088 cg20078646 cg11268585 cg06864789 cg11882358 cg04316537 cg00939409 cg26983017 cg24697097 cg06403901 cg04768387 cg17268094
## 200223270003_R02C01 0.8425160 0.04383925 0.8585231 0.6201457 0.8424817 0.06198170 0.2521544 0.05369415 0.89136326 0.8074830 0.2652180 0.89868232 0.6760078 0.92790690 0.3131047 0.5774753
## 200223270003_R03C01 0.4994032 0.50720277 0.8493466 0.9011727 0.2417656 0.89537412 0.8535791 0.46053125 0.04943344 0.8453340 0.8882671 0.03145466 0.1321724 0.04783341 0.9465814 0.9003262
## 200223270003_R06C01 0.3494312 0.89604910 0.8395241 0.9085849 0.2618620 0.08725521 0.9121931 0.87513655 0.80176322 0.4351695 0.8842646 0.84677625 0.6566922 0.05253626 0.9098563 0.8789368
## cg01128042 cg17061760 cg14507637 cg16202259 cg19799454 cg03723481 cg08198851 cg05891136 cg04412904 cg12953206 cg11227702 cg18150287 cg25598710 cg12333628 cg22162835 cg14168080
## 200223270003_R02C01 0.9113420 0.08726914 0.9051258 0.9548726 0.9178930 0.4347333 0.6578905 0.7797403 0.05088595 0.2364836 0.86486075 0.7685695 0.3105752 0.9227884 0.8841958 0.4190123
## 200223270003_R03C01 0.5328806 0.59377488 0.9009460 0.3713483 0.9106247 0.9007774 0.6578186 0.3310206 0.07717659 0.2338141 0.49184121 0.7519166 0.3088142 0.9092861 0.8639817 0.4420256
## 200223270003_R06C01 0.5222757 0.83354475 0.9013686 0.4852461 0.9066551 0.8947417 0.1272153 0.7965298 0.08253743 0.6638030 0.02543724 0.2501173 0.8538820 0.5084647 0.9112085 0.4355521
## cg27160885 cg05161773 cg11169344 cg25306893 cg14181112 cg08455905 cg21415084 cg00962106 cg08745107 cg11286989 cg15775217 cg24139837 cg04645024 cg22933800 cg11314779 cg21697769
## 200223270003_R02C01 0.2231606 0.4120912 0.6720163 0.6265392 0.7043545 0.9052876 0.8374415 0.9124898 0.02921338 0.7590008 0.5707441 0.07404605 0.7366541 0.4830774 0.0242134 0.8946108
## 200223270003_R03C01 0.8263885 0.4154907 0.8215477 0.8330282 0.1615405 0.9211801 0.8509420 0.5375751 0.78542320 0.8533989 0.9168327 0.04183445 0.8454827 0.4142525 0.8966100 0.2822953
## 200223270003_R06C01 0.2121179 0.8526849 0.5941114 0.6175380 0.3424621 0.8965339 0.8378237 0.5040948 0.02709928 0.7313884 0.6042521 0.05657120 0.0871902 0.3956683 0.8908661 0.8698740
## cg13739190 cg12543766 cg15138543 cg11401796 cg16715186 cg00696044 cg16655091 cg06115838 cg00084271 cg24883219 cg20673830 cg08788093 cg15586958 cg01153376 cg15600437 cg08669168
## 200223270003_R02C01 0.8510103 0.51028134 0.7734778 0.8453050 0.2742789 0.55608424 0.6055295 0.8847724 0.8103611 0.6430473 0.2422052 0.03911678 0.9058263 0.4872148 0.4885353 0.9226769
## 200223270003_R03C01 0.8358482 0.88741539 0.2949313 0.4319176 0.7946153 0.07552381 0.7053336 0.8447916 0.7877006 0.6822115 0.6881735 0.60934160 0.8957526 0.9639670 0.4894487 0.9164547
## 200223270003_R06C01 0.8419471 0.02818501 0.2496147 0.4370329 0.8124316 0.79270858 0.8724479 0.8805585 0.7706165 0.5296903 0.2134634 0.88380243 0.9121763 0.2242410 0.8551374 0.6362087
## cg23352245 cg22542451 cg16180556 cg11540596 cg22666875 cg23177161 cg10681981 cg02356645 cg24307368 cg18949721 cg02372404 cg07480955 cg04971651 cg00272795 cg14532717 cg25366315
## 200223270003_R02C01 0.9377232 0.5884356 0.39300141 0.9238951 0.8177182 0.4151698 0.7035090 0.5105903 0.64323677 0.2334245 0.03598249 0.3874638 0.8902474 0.46365138 0.5732280 0.9182318
## 200223270003_R03C01 0.9375774 0.8337068 0.07312155 0.8926595 0.8291957 0.4586576 0.7382662 0.5833923 0.34980461 0.2437792 0.02767285 0.3916889 0.9219452 0.82839260 0.1107638 0.9209800
## 200223270003_R06C01 0.5932742 0.8125084 0.20051805 0.8820252 0.3694180 0.8287312 0.6971989 0.5701428 0.02720398 0.2523095 0.03127855 0.4043390 0.9035233 0.07231279 0.6273416 0.8972984
## cg06394820 cg07138269 cg26081710 cg25758034 cg22112152 cg19301366 cg23658987 cg00819121 cg23923019 cg10091792 cg21507367 cg16779438 cg14710850 cg06118351 cg11019791 cg01910713
## 200223270003_R02C01 0.8513195 0.5002290 0.8751040 0.6114028 0.8476101 0.8831393 0.79757644 0.9207001 0.8555018 0.8670733 0.9268560 0.8826150 0.8048592 0.3633940 0.8112324 0.8573169
## 200223270003_R03C01 0.8695521 0.9426707 0.9198212 0.6649219 0.8014136 0.8072679 0.07511718 0.9281472 0.3058914 0.5864221 0.9290102 0.5466924 0.8090950 0.4714860 0.7831231 0.8538850
## 200223270003_R06C01 0.4415020 0.5057781 0.8801892 0.2393844 0.7897897 0.8796022 0.10177571 0.9327211 0.8108207 0.6087997 0.9039559 0.8629492 0.8285902 0.8655962 0.4353250 0.8110366
## cg21812850 cg22535849 cg03395511 cg08857872 cg20678988 cg02887598 cg06634367 cg12702014 cg01921484 cg12776173 cg06539076 cg00247094 cg06546677 cg25712921 cg14582632 cg01549082
## 200223270003_R02C01 0.7920645 0.8847704 0.4491605 0.3395280 0.8438718 0.04020908 0.8695793 0.7704049 0.9098550 0.1038804 0.8498176 0.5399349 0.4472216 0.2829848 0.8475098 0.2924138
## 200223270003_R03C01 0.7688711 0.8609966 0.4835967 0.8181845 0.8548886 0.67073881 0.9512930 0.7848681 0.9093137 0.8730635 0.5754432 0.9315640 0.8484609 0.6220919 0.5526692 0.7065693
## 200223270003_R06C01 0.7702792 0.8808022 0.5523959 0.2970779 0.7786685 0.73408417 0.9544163 0.8065993 0.9204487 0.7009491 0.5700959 0.5177874 0.5636023 0.6384003 0.5288675 0.2895440
## cg10993865 cg26948066 cg09015880 cg11133939 cg25277809 cg12421087 cg24634455 cg02225060 cg25169289 cg00512739 cg10978526 cg23066280 cg06880438 cg10666341 cg10240127 cg23432430
## 200223270003_R02C01 0.9173768 0.4685225 0.5101716 0.1282694 0.1632342 0.5647607 0.7796391 0.6828159 0.1100884 0.9337648 0.5671930 0.07247841 0.8285145 0.9046648 0.9250553 0.9482702
## 200223270003_R03C01 0.9096170 0.5026045 0.8402106 0.5920898 0.4913711 0.5399655 0.5188241 0.8265195 0.7667174 0.8863895 0.9095713 0.57174588 0.7988881 0.6731062 0.9403255 0.9455418
## 200223270003_R06C01 0.4904519 0.9101976 0.8472063 0.5127706 0.5952124 0.5400348 0.5325725 0.5209552 0.2264993 0.9242748 0.8945157 0.80814756 0.7839538 0.6443180 0.9056974 0.9418716
## cg16652920 cg12228670 cg07028768 cg26853071 cg06277607 cg07104639 cg14240646 cg06960717 cg00086247 cg06631775 cg09584650 cg01023242 cg27272246 cg10738049 cg12689021 cg20208879
## 200223270003_R02C01 0.9436000 0.8632174 0.4496851 0.4233820 0.10744587 0.6772717 0.5391334 0.7030978 0.1761275 0.8340699 0.08230254 0.7210683 0.8615873 0.5441211 0.7706828 0.66986658
## 200223270003_R03C01 0.9431222 0.8496212 0.8536078 0.7451354 0.09353494 0.7123879 0.2538363 0.7653402 0.2045043 0.8406280 0.09661586 0.9032685 0.8705287 0.5232715 0.7449475 0.02423079
## 200223270003_R06C01 0.9457161 0.8738949 0.8356936 0.4228079 0.09504696 0.8099688 0.1864902 0.7206218 0.6901217 0.9104546 0.52399749 0.7831190 0.8103777 0.4875473 0.7872237 0.61769424
## cg22741595 cg19097407 cg03924089 cg05850457 cg04664583 cg09216282 cg03982462 cg06715136 cg14649234 cg15501526 cg04248279 cg06833284 cg16571124 cg07158503 cg06371647 cg17671604
## 200223270003_R02C01 0.6525533 0.1417931 0.7920449 0.8183013 0.5572814 0.9349248 0.8562777 0.3400192 0.05165754 0.6362531 0.8534976 0.9125144 0.9282854 0.5777146 0.8336894 0.3134752
## 200223270003_R03C01 0.1730013 0.8367297 0.7370283 0.8313023 0.5881190 0.9244259 0.6023731 0.9259109 0.79015014 0.6319253 0.8458854 0.9003482 0.9206431 0.6203543 0.8198684 0.6325735
## 200223270003_R06C01 0.1550739 0.2276425 0.8506756 0.8161364 0.9352717 0.9263996 0.8778458 0.9079807 0.65413166 0.7435100 0.8332786 0.6097933 0.9276842 0.6236025 0.8069537 0.7054536
## cg14175932 cg26219488 cg03979311 cg18819889 cg05570109 cg02981548 cg25208881 cg08861434 cg04718469 cg00689685 cg24433124 cg17429539 cg00322003 cg07504457 cg05799088 cg18526121
## 200223270003_R02C01 0.5746953 0.9336638 0.86644909 0.9156157 0.3466611 0.1342571 0.1851956 0.8768306 0.8687522 0.7019389 0.1316610 0.7860900 0.1759911 0.7116230 0.9023317 0.4519781
## 200223270003_R03C01 0.8779027 0.9134707 0.06199853 0.9004455 0.5866750 0.5220037 0.9092286 0.4352647 0.7256813 0.8634268 0.5987648 0.7100923 0.5702070 0.6854539 0.8779381 0.4762313
## 200223270003_R06C01 0.7288239 0.9261878 0.72615553 0.9054439 0.4046471 0.5098965 0.9265502 0.8698813 0.8521881 0.6378795 0.8188082 0.7660838 0.3077122 0.7205633 0.6887230 0.4833367
## cg05155812 cg05876883 cg23517115 cg20398163 cg13405878 cg03672288 cg18816397 cg14016568 cg14687298 cg14627380 cg10864200 cg00154902 cg15098922 cg19242610 cg15985500 cg26901661
## 200223270003_R02C01 0.4514427 0.9039064 0.2151144 0.1728144 0.4549662 0.9235592 0.5472925 0.08344693 0.04206702 0.9455369 0.7380052 0.5137741 0.9286092 0.5188218 0.8555262 0.8951971
## 200223270003_R03C01 0.9070932 0.9223308 0.9131440 0.8728944 0.7858042 0.6718625 0.4940355 0.60983854 0.14813581 0.9258964 0.7421384 0.8540746 0.9027517 0.9236389 0.8312198 0.8754981
## 200223270003_R06C01 0.4107396 0.4697980 0.8328364 0.2623391 0.7583938 0.9007629 0.5337018 0.43697966 0.24260002 0.5789898 0.5945457 0.8188126 0.8525611 0.8761320 0.8492103 0.9021064
## cg10039445 cg00004073 cg07634717 cg03129555 cg21392220 cg11403739 cg06697310 cg02631626 cg17129965 cg06231502 cg12063064 cg14623940 cg09727210 cg18918831 cg21243064 cg27577781
## 200223270003_R02C01 0.8833873 0.02928535 0.7483382 0.6079616 0.8726204 0.3972310 0.8454609 0.6280766 0.8972140 0.7784451 0.9357515 0.7623774 0.4240111 0.4891660 0.5191606 0.8143535
## 200223270003_R03C01 0.8954055 0.02787198 0.8254434 0.5785498 0.8563905 0.5752869 0.8653044 0.1951736 0.8806673 0.7964278 0.9436901 0.8732905 0.8812928 0.5333801 0.9167649 0.8113185
## 200223270003_R06C01 0.8832807 0.64576463 0.8181246 0.9137818 0.8466199 0.7326415 0.2405168 0.2699849 0.8857237 0.7706160 0.5490657 0.8661720 0.8493743 0.6406575 0.4862205 0.8144274
## cg20685672 cg07478795 cg03660162 cg17042243 DX
## 200223270003_R02C01 0.6712101 0.8911007 0.8691767 0.2502905 MCI
## 200223270003_R03C01 0.7932091 0.9095543 0.5160770 0.2933475 CN
## 200223270003_R06C01 0.6613646 0.8905903 0.9026304 0.2725457 CN
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
print(dim(processed_dataFrame))
## [1] 554 325
print(length(AfterProcess_FeatureName))
## [1] 325
print(head(processed_data))
## # A tibble: 6 × 325
## age.now PC1 PC2 PC3 cg18993517 cg13573375 cg02621446 cg24470466 cg08896901 cg12146221 cg05234269 cg14293999 cg19377607 cg14307563 cg17018422 cg08914944 cg21209485 cg11331837 cg11187460
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 82.4 -0.214 1.47e-2 -0.0140 0.209 0.867 0.873 0.773 0.358 0.205 0.938 0.284 0.0538 0.186 0.526 0.634 0.887 0.0369 0.0367
## 2 78.6 -0.173 5.75e-2 0.00506 0.267 0.173 0.810 0.904 0.247 0.181 0.575 0.917 0.906 0.892 0.903 0.0439 0.871 0.572 0.925
## 3 80.4 -0.00367 8.37e-2 0.0291 0.257 0.889 0.751 0.121 0.923 0.862 0.0247 0.917 0.0664 0.875 0.510 0.0689 0.229 0.0318 0.0311
## 4 62.9 0.0268 1.65e-5 0.0529 0.940 0.161 0.205 0.190 0.924 0.202 0.948 0.197 0.0634 0.876 0.0209 0.628 0.888 0.930 0.540
## 5 80.7 -0.0379 1.57e-2 -0.00869 0.950 0.851 0.796 0.207 0.264 0.138 0.563 0.903 0.916 0.917 0.0174 0.630 0.229 0.540 0.911
## 6 80.6 0.122 3.46e-2 0.0511 0.926 0.132 0.0470 0.909 0.272 0.142 0.573 0.173 0.596 0.891 0.0215 0.597 0.881 0.924 0.910
## # ℹ 306 more variables: cg09451339 <dbl>, cg11866463 <dbl>, cg06961873 <dbl>, cg14564293 <dbl>, cg02464073 <dbl>, cg07437923 <dbl>, cg12012426 <dbl>, cg23159970 <dbl>, cg10788927 <dbl>,
## # cg05392160 <dbl>, cg04540199 <dbl>, cg10890644 <dbl>, cg18285382 <dbl>, cg03796003 <dbl>, cg24851651 <dbl>, cg22071943 <dbl>, cg03549208 <dbl>, cg17653352 <dbl>, cg16405337 <dbl>,
## # cg04831745 <dbl>, cg07640670 <dbl>, cg25879395 <dbl>, cg09708852 <dbl>, cg05593887 <dbl>, cg26705599 <dbl>, cg07227024 <dbl>, cg21783012 <dbl>, cg03221390 <dbl>, cg04728936 <dbl>,
## # cg10985055 <dbl>, cg09289202 <dbl>, cg15912814 <dbl>, cg03327352 <dbl>, cg20300784 <dbl>, cg06286533 <dbl>, cg05130642 <dbl>, cg15491125 <dbl>, cg04462915 <dbl>, cg08880261 <dbl>,
## # cg25649515 <dbl>, cg08779649 <dbl>, cg22169467 <dbl>, cg00553601 <dbl>, cg03749159 <dbl>, cg26069044 <dbl>, cg03088219 <dbl>, cg17738613 <dbl>, cg02772171 <dbl>, cg17186592 <dbl>,
## # cg16956806 <dbl>, cg06012903 <dbl>, cg14192979 <dbl>, cg17906851 <dbl>, cg01933473 <dbl>, cg16089727 <dbl>, cg18857647 <dbl>, cg00767423 <dbl>, cg16771215 <dbl>, cg03737947 <dbl>,
## # cg26679884 <dbl>, cg22305850 <dbl>, cg26846609 <dbl>, cg21139150 <dbl>, cg25059696 <dbl>, cg21388339 <dbl>, cg13815695 <dbl>, cg26642936 <dbl>, cg04888234 <dbl>, cg08096656 <dbl>, …
print(dim(processed_data))
## [1] 554 325
print(AfterProcess_FeatureName)
## [1] "age.now" "PC1" "PC2" "PC3" "cg18993517" "cg13573375" "cg02621446" "cg24470466" "cg08896901" "cg12146221" "cg05234269" "cg14293999" "cg19377607" "cg14307563" "cg17018422"
## [16] "cg08914944" "cg21209485" "cg11331837" "cg11187460" "cg09451339" "cg11866463" "cg06961873" "cg14564293" "cg02464073" "cg07437923" "cg12012426" "cg23159970" "cg10788927" "cg05392160" "cg04540199"
## [31] "cg10890644" "cg18285382" "cg03796003" "cg24851651" "cg22071943" "cg03549208" "cg17653352" "cg16405337" "cg04831745" "cg07640670" "cg25879395" "cg09708852" "cg05593887" "cg26705599" "cg07227024"
## [46] "cg21783012" "cg03221390" "cg04728936" "cg10985055" "cg09289202" "cg15912814" "cg03327352" "cg20300784" "cg06286533" "cg05130642" "cg15491125" "cg04462915" "cg08880261" "cg25649515" "cg08779649"
## [61] "cg22169467" "cg00553601" "cg03749159" "cg26069044" "cg03088219" "cg17738613" "cg02772171" "cg17186592" "cg16956806" "cg06012903" "cg14192979" "cg17906851" "cg01933473" "cg16089727" "cg18857647"
## [76] "cg00767423" "cg16771215" "cg03737947" "cg26679884" "cg22305850" "cg26846609" "cg21139150" "cg25059696" "cg21388339" "cg13815695" "cg26642936" "cg04888234" "cg08096656" "cg03600007" "cg11438323"
## [91] "cg27086157" "cg15535896" "cg18698799" "cg12501287" "cg14228103" "cg08514194" "cg10738648" "cg08138245" "cg23836570" "cg09785377" "cg16536985" "cg12784167" "cg15633912" "cg02495179" "cg04875706"
## [106] "cg19471911" "cg20823859" "cg02078724" "cg03635532" "cg04242342" "cg09247979" "cg02246922" "cg25436480" "cg06483046" "cg12534577" "cg02550738" "cg01008088" "cg20078646" "cg11268585" "cg06864789"
## [121] "cg11882358" "cg04316537" "cg00939409" "cg26983017" "cg24697097" "cg06403901" "cg04768387" "cg17268094" "cg01128042" "cg17061760" "cg14507637" "cg16202259" "cg19799454" "cg03723481" "cg08198851"
## [136] "cg05891136" "cg04412904" "cg12953206" "cg11227702" "cg18150287" "cg25598710" "cg12333628" "cg22162835" "cg14168080" "cg27160885" "cg05161773" "cg11169344" "cg25306893" "cg14181112" "cg08455905"
## [151] "cg21415084" "cg00962106" "cg08745107" "cg11286989" "cg15775217" "cg24139837" "cg04645024" "cg22933800" "cg11314779" "cg21697769" "cg13739190" "cg12543766" "cg15138543" "cg11401796" "cg16715186"
## [166] "cg00696044" "cg16655091" "cg06115838" "cg00084271" "cg24883219" "cg20673830" "cg08788093" "cg15586958" "cg01153376" "cg15600437" "cg08669168" "cg23352245" "cg22542451" "cg16180556" "cg11540596"
## [181] "cg22666875" "cg23177161" "cg10681981" "cg02356645" "cg24307368" "cg18949721" "cg02372404" "cg07480955" "cg04971651" "cg00272795" "cg14532717" "cg25366315" "cg06394820" "cg07138269" "cg26081710"
## [196] "cg25758034" "cg22112152" "cg19301366" "cg23658987" "cg00819121" "cg23923019" "cg10091792" "cg21507367" "cg16779438" "cg14710850" "cg06118351" "cg11019791" "cg01910713" "cg21812850" "cg22535849"
## [211] "cg03395511" "cg08857872" "cg20678988" "cg02887598" "cg06634367" "cg12702014" "cg01921484" "cg12776173" "cg06539076" "cg00247094" "cg06546677" "cg25712921" "cg14582632" "cg01549082" "cg10993865"
## [226] "cg26948066" "cg09015880" "cg11133939" "cg25277809" "cg12421087" "cg24634455" "cg02225060" "cg25169289" "cg00512739" "cg10978526" "cg23066280" "cg06880438" "cg10666341" "cg10240127" "cg23432430"
## [241] "cg16652920" "cg12228670" "cg07028768" "cg26853071" "cg06277607" "cg07104639" "cg14240646" "cg06960717" "cg00086247" "cg06631775" "cg09584650" "cg01023242" "cg27272246" "cg10738049" "cg12689021"
## [256] "cg20208879" "cg22741595" "cg19097407" "cg03924089" "cg05850457" "cg04664583" "cg09216282" "cg03982462" "cg06715136" "cg14649234" "cg15501526" "cg04248279" "cg06833284" "cg16571124" "cg07158503"
## [271] "cg06371647" "cg17671604" "cg14175932" "cg26219488" "cg03979311" "cg18819889" "cg05570109" "cg02981548" "cg25208881" "cg08861434" "cg04718469" "cg00689685" "cg24433124" "cg17429539" "cg00322003"
## [286] "cg07504457" "cg05799088" "cg18526121" "cg05155812" "cg05876883" "cg23517115" "cg20398163" "cg13405878" "cg03672288" "cg18816397" "cg14016568" "cg14687298" "cg14627380" "cg10864200" "cg00154902"
## [301] "cg15098922" "cg19242610" "cg15985500" "cg26901661" "cg10039445" "cg00004073" "cg07634717" "cg03129555" "cg21392220" "cg11403739" "cg06697310" "cg02631626" "cg17129965" "cg06231502" "cg12063064"
## [316] "cg14623940" "cg09727210" "cg18918831" "cg21243064" "cg27577781" "cg20685672" "cg07478795" "cg03660162" "cg17042243" "DX"
print("Number of Features :")
## [1] "Number of Features :"
Num_feaForProcess = length(AfterProcess_FeatureName)-1 # exclude the "DX" label
print(Num_feaForProcess)
## [1] 324
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123) # for reproducibility
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 389 325
dim(testData)
## [1] 165 325
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_modelTrain_LRM1 <- caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 53 12
## MCI 13 87
##
## Accuracy : 0.8485
## 95% CI : (0.7845, 0.8995)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 3.459e-12
##
## Kappa : 0.6835
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.8030
## Specificity : 0.8788
## Pos Pred Value : 0.8154
## Neg Pred Value : 0.8700
## Prevalence : 0.4000
## Detection Rate : 0.3212
## Detection Prevalence : 0.3939
## Balanced Accuracy : 0.8409
##
## 'Positive' Class : CN
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_modelTrain_LRM1_Accuracy<-cm_modelTrain_LRM1$overall["Accuracy"]
cm_modelTrain_LRM1_Kappa<-cm_modelTrain_LRM1$overall["Kappa"]
print(cm_modelTrain_LRM1_Accuracy)
## Accuracy
## 0.8484848
print(cm_modelTrain_LRM1_Kappa)
## Kappa
## 0.6835443
print(model_LRM1)
## glmnet
##
## 389 samples
## 324 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001780646 0.8353646 0.6491137
## 0.10 0.0017806455 0.8353646 0.6491137
## 0.10 0.0178064554 0.8328005 0.6431078
## 0.55 0.0001780646 0.7789211 0.5282468
## 0.55 0.0017806455 0.7789211 0.5282468
## 0.55 0.0178064554 0.7403929 0.4435675
## 1.00 0.0001780646 0.7583750 0.4852216
## 1.00 0.0017806455 0.7506827 0.4695577
## 1.00 0.0178064554 0.7069264 0.3742748
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.001780646.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
modelTrain_LRM1_trainAccuracy<-train_accuracy
print(modelTrain_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
modelTrain_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(modelTrain_mean_accuracy_cv_LRM1)
## [1] 0.7797499
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.9151
## [1] "The auc value is:"
## Area under the curve: 0.9151
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6 ){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
modelTrain_LRM1_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_LRM1_AUC <- mean_auc
}
print(modelTrain_LRM1_AUC)
## Area under the curve: 0.9151
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 324)
##
## Overall
## PC2 100.00
## cg27272246 62.06
## cg14710850 56.88
## cg23432430 55.63
## cg00004073 55.11
## cg13405878 54.62
## cg02981548 54.59
## cg14582632 51.72
## cg20685672 51.28
## cg08788093 51.25
## cg03924089 50.77
## cg21243064 49.95
## cg16652920 49.72
## cg02225060 49.32
## cg07480955 49.21
## cg24433124 48.91
## cg12543766 48.55
## cg14687298 48.48
## cg19471911 48.25
## cg17129965 48.25
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 4.342709708
## 2 2.695008548
## 3 2.470306237
## 4 2.415662696
## 5 2.393164117
## 6 2.372149246
## 7 2.370840668
## 8 2.245960513
## 9 2.226762451
## 10 2.225703408
## 11 2.204887563
## 12 2.169330142
## 13 2.159138387
## 14 2.141788902
## 15 2.136929632
## 16 2.123955669
## 17 2.108572611
## 18 2.105391751
## 19 2.095437287
## 20 2.095230799
## 21 2.052099001
## 22 2.050252029
## 23 2.000238643
## 24 1.946757319
## 25 1.905468754
## 26 1.900164386
## 27 1.865030049
## 28 1.840227146
## 29 1.817656583
## 30 1.801999881
## 31 1.796975592
## 32 1.780956081
## 33 1.769734873
## 34 1.694320692
## 35 1.671928372
## 36 1.661885998
## 37 1.659451604
## 38 1.650267038
## 39 1.605300335
## 40 1.604874811
## 41 1.586837763
## 42 1.555583933
## 43 1.518704718
## 44 1.508158521
## 45 1.505997370
## 46 1.489869847
## 47 1.469090236
## 48 1.460598745
## 49 1.442873209
## 50 1.437368866
## 51 1.437065271
## 52 1.410844308
## 53 1.396111701
## 54 1.390593788
## 55 1.387850714
## 56 1.376292733
## 57 1.355603794
## 58 1.347662211
## 59 1.312815017
## 60 1.288834327
## 61 1.286034583
## 62 1.264184885
## 63 1.244750386
## 64 1.241246570
## 65 1.224316918
## 66 1.219853099
## 67 1.219495444
## 68 1.218695572
## 69 1.198398012
## 70 1.198153437
## 71 1.197673371
## 72 1.175690381
## 73 1.173767627
## 74 1.169864075
## 75 1.169177618
## 76 1.165437663
## 77 1.157322214
## 78 1.148574956
## 79 1.146059841
## 80 1.122933761
## 81 1.116992501
## 82 1.103367102
## 83 1.096220536
## 84 1.078387506
## 85 1.077820681
## 86 1.076206464
## 87 1.071794634
## 88 1.049465269
## 89 1.032597731
## 90 1.029054439
## 91 1.021781969
## 92 1.020795649
## 93 1.016851330
## 94 1.016546697
## 95 1.014553889
## 96 1.014319694
## 97 1.007027741
## 98 0.998375458
## 99 0.992883248
## 100 0.983678931
## 101 0.979608209
## 102 0.963171481
## 103 0.959805233
## 104 0.958520851
## 105 0.945840363
## 106 0.939927566
## 107 0.929373200
## 108 0.929200935
## 109 0.920875258
## 110 0.913958831
## 111 0.913333919
## 112 0.909756018
## 113 0.899913599
## 114 0.896889193
## 115 0.892645747
## 116 0.890813782
## 117 0.883015597
## 118 0.878687481
## 119 0.870649004
## 120 0.865961178
## 121 0.865604654
## 122 0.865565175
## 123 0.843136081
## 124 0.841561710
## 125 0.831315130
## 126 0.827534856
## 127 0.826492399
## 128 0.826077787
## 129 0.823200575
## 130 0.818349766
## 131 0.806843104
## 132 0.806232756
## 133 0.797581170
## 134 0.796388697
## 135 0.794755844
## 136 0.793992482
## 137 0.793877965
## 138 0.792007925
## 139 0.784958547
## 140 0.784873759
## 141 0.780521643
## 142 0.778290948
## 143 0.768408356
## 144 0.759839257
## 145 0.750532340
## 146 0.747809526
## 147 0.739302692
## 148 0.737261237
## 149 0.728721845
## 150 0.724339052
## 151 0.724219651
## 152 0.723758077
## 153 0.718564000
## 154 0.718088247
## 155 0.714704146
## 156 0.704483969
## 157 0.704030356
## 158 0.689596329
## 159 0.685638413
## 160 0.683967836
## 161 0.680478884
## 162 0.676772915
## 163 0.675648530
## 164 0.672810833
## 165 0.665748354
## 166 0.661642264
## 167 0.657605510
## 168 0.657190823
## 169 0.646671151
## 170 0.643889421
## 171 0.634010407
## 172 0.632109467
## 173 0.617139059
## 174 0.615566781
## 175 0.601320116
## 176 0.600199164
## 177 0.593511627
## 178 0.590968371
## 179 0.587503181
## 180 0.580214664
## 181 0.578983278
## 182 0.573232621
## 183 0.571579406
## 184 0.569928960
## 185 0.568222509
## 186 0.561440224
## 187 0.554595877
## 188 0.522875143
## 189 0.522804775
## 190 0.520396750
## 191 0.515571230
## 192 0.513532998
## 193 0.511299723
## 194 0.508903012
## 195 0.507385698
## 196 0.494873418
## 197 0.493941607
## 198 0.491860396
## 199 0.489699283
## 200 0.481798184
## 201 0.481366992
## 202 0.479674389
## 203 0.475136846
## 204 0.473522096
## 205 0.464075022
## 206 0.462961365
## 207 0.455525781
## 208 0.453227116
## 209 0.451046999
## 210 0.440690640
## 211 0.437617698
## 212 0.428776060
## 213 0.417682200
## 214 0.405221212
## 215 0.403881269
## 216 0.394634684
## 217 0.392087174
## 218 0.391489964
## 219 0.388924963
## 220 0.382567592
## 221 0.376186809
## 222 0.371390252
## 223 0.368662602
## 224 0.364425248
## 225 0.358865351
## 226 0.351915109
## 227 0.340187942
## 228 0.327139621
## 229 0.308449192
## 230 0.301083570
## 231 0.299857743
## 232 0.296001536
## 233 0.294897459
## 234 0.294077943
## 235 0.291898629
## 236 0.284560320
## 237 0.284435798
## 238 0.280035681
## 239 0.277000763
## 240 0.269300638
## 241 0.267982616
## 242 0.264654423
## 243 0.262117712
## 244 0.260933972
## 245 0.260506228
## 246 0.260501495
## 247 0.257051425
## 248 0.255747664
## 249 0.253325109
## 250 0.239395733
## 251 0.239066288
## 252 0.219592209
## 253 0.218276905
## 254 0.217345674
## 255 0.217278378
## 256 0.213378334
## 257 0.207514206
## 258 0.202355841
## 259 0.199777244
## 260 0.174584097
## 261 0.168023646
## 262 0.164035640
## 263 0.162501085
## 264 0.151502454
## 265 0.149111802
## 266 0.147775115
## 267 0.145596305
## 268 0.144383610
## 269 0.142595010
## 270 0.141616122
## 271 0.137315829
## 272 0.132057540
## 273 0.125235223
## 274 0.090369929
## 275 0.082667479
## 276 0.069768703
## 277 0.067862582
## 278 0.066357690
## 279 0.062992435
## 280 0.062410791
## 281 0.051304846
## 282 0.047833944
## 283 0.045335795
## 284 0.040468254
## 285 0.038929868
## 286 0.022310369
## 287 0.021349325
## 288 0.020068070
## 289 0.004816652
## 290 0.000000000
## 291 0.000000000
## 292 0.000000000
## 293 0.000000000
## 294 0.000000000
## 295 0.000000000
## 296 0.000000000
## 297 0.000000000
## 298 0.000000000
## 299 0.000000000
## 300 0.000000000
## 301 0.000000000
## 302 0.000000000
## 303 0.000000000
## 304 0.000000000
## 305 0.000000000
## 306 0.000000000
## 307 0.000000000
## 308 0.000000000
## 309 0.000000000
## 310 0.000000000
## 311 0.000000000
## 312 0.000000000
## 313 0.000000000
## 314 0.000000000
## 315 0.000000000
## 316 0.000000000
## 317 0.000000000
## 318 0.000000000
## 319 0.000000000
## 320 0.000000000
## 321 0.000000000
## 322 0.000000000
## 323 0.000000000
## 324 0.000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CN MCI
## 221 333
prop.table(table(df_LRM1$DX))
##
## CN MCI
## 0.398917 0.601083
table(trainData$DX)
##
## CN MCI
## 155 234
prop.table(table(trainData$DX))
##
## CN MCI
## 0.3984576 0.6015424
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.506787
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.509677Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 22.643, df = 1, p-value = 1.951e-06
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 16.044, df = 1, p-value = 6.19e-05library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN MCI
## 310 234
dim(balanced_data_LGR_1)
## [1] 544 325
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
cm_modelTrain_LRM2<-caret::confusionMatrix(predictions, testData$DX)
print(cm_modelTrain_LRM2)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 56 14
## MCI 10 85
##
## Accuracy : 0.8545
## 95% CI : (0.7913, 0.9045)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 9.051e-13
##
## Kappa : 0.7
##
## Mcnemar's Test P-Value : 0.5403
##
## Sensitivity : 0.8485
## Specificity : 0.8586
## Pos Pred Value : 0.8000
## Neg Pred Value : 0.8947
## Prevalence : 0.4000
## Detection Rate : 0.3394
## Detection Prevalence : 0.4242
## Balanced Accuracy : 0.8535
##
## 'Positive' Class : CN
##
cm_modelTrain_LRM2_Accuracy<-cm_modelTrain_LRM2$overall["Accuracy"]
cm_modelTrain_LRM2_Kappa<-cm_modelTrain_LRM2$overall["Kappa"]
print(cm_modelTrain_LRM2_Accuracy)
## Accuracy
## 0.8545455
print(cm_modelTrain_LRM2_Kappa)
## Kappa
## 0.7
print(model_LRM2)
## glmnet
##
## 544 samples
## 324 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 435, 435, 435, 435, 436
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001903546 0.9099218 0.8133355
## 0.10 0.0019035465 0.9117567 0.8170451
## 0.10 0.0190354647 0.9043833 0.8018562
## 0.55 0.0001903546 0.8915053 0.7751078
## 0.55 0.0019035465 0.8860007 0.7637244
## 0.55 0.0190354647 0.8528712 0.6923494
## 1.00 0.0001903546 0.8730887 0.7364321
## 1.00 0.0019035465 0.8602107 0.7098336
## 1.00 0.0190354647 0.8234455 0.6300338
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.001903546.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
modelTrain_LRM2_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", modelTrain_LRM2_trainAccuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8792426
modelTrain_LRM2_mean_accuracy_model_LRM2 <- mean_accuracy_model_LRM2
print(modelTrain_LRM2_mean_accuracy_model_LRM2)
## [1] 0.8792426
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 324)
##
## Overall
## PC2 100.00
## cg27272246 60.10
## cg00004073 58.87
## cg14710850 56.51
## cg02981548 53.65
## cg23432430 53.16
## cg13405878 52.49
## cg08788093 51.71
## cg21243064 51.56
## cg14582632 51.55
## cg02225060 51.28
## cg03924089 51.07
## cg06833284 51.03
## cg16652920 50.18
## cg20685672 49.41
## cg07480955 48.11
## cg17129965 47.70
## cg24433124 47.45
## cg11169344 47.01
## cg12543766 46.59
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3||METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG ==5 || METHOD_FEATURE_FLAG == 6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 4.3907972675
## 2 2.6386679793
## 3 2.5850770449
## 4 2.4813661059
## 5 2.3557903914
## 6 2.3341745305
## 7 2.3046689528
## 8 2.2703611825
## 9 2.2639192178
## 10 2.2634932348
## 11 2.2518167537
## 12 2.2424587149
## 13 2.2408023813
## 14 2.2034946196
## 15 2.1695321515
## 16 2.1125462008
## 17 2.0945961742
## 18 2.0835343449
## 19 2.0639844195
## 20 2.0457479883
## 21 2.0269661034
## 22 2.0224731997
## 23 1.9985580924
## 24 1.9615537643
## 25 1.9570034474
## 26 1.9179238241
## 27 1.8970365429
## 28 1.8190447714
## 29 1.7906779173
## 30 1.7727638689
## 31 1.7499576034
## 32 1.7468976208
## 33 1.7448593103
## 34 1.7288513648
## 35 1.6794995613
## 36 1.6433591505
## 37 1.6430044924
## 38 1.6387438941
## 39 1.6324881782
## 40 1.5678739726
## 41 1.5561217742
## 42 1.5544955599
## 43 1.5398539455
## 44 1.5287690362
## 45 1.5049998901
## 46 1.4812220434
## 47 1.4757384496
## 48 1.4674162183
## 49 1.4665549165
## 50 1.4512362053
## 51 1.4357504532
## 52 1.4281188957
## 53 1.4246271323
## 54 1.4136176354
## 55 1.3979326094
## 56 1.3835478391
## 57 1.3668640950
## 58 1.3345674364
## 59 1.3328200828
## 60 1.3221308324
## 61 1.3205041230
## 62 1.2920319102
## 63 1.2792807333
## 64 1.2703409928
## 65 1.2566400433
## 66 1.2391176273
## 67 1.2244156744
## 68 1.2177844685
## 69 1.2040070352
## 70 1.2034444332
## 71 1.1811121241
## 72 1.1637221453
## 73 1.1509407277
## 74 1.1421243177
## 75 1.1420531815
## 76 1.1393980669
## 77 1.1391417401
## 78 1.1110315215
## 79 1.1067732198
## 80 1.1062409850
## 81 1.1034186704
## 82 1.0977857276
## 83 1.0970912048
## 84 1.0871909834
## 85 1.0676346149
## 86 1.0579021800
## 87 1.0419385160
## 88 1.0409926255
## 89 1.0327760660
## 90 1.0251244123
## 91 1.0246980070
## 92 1.0116506743
## 93 1.0070833787
## 94 1.0019263605
## 95 1.0011406943
## 96 0.9999665201
## 97 0.9998823783
## 98 0.9958400593
## 99 0.9831693310
## 100 0.9796662575
## 101 0.9792686726
## 102 0.9744812487
## 103 0.9697228115
## 104 0.9676994738
## 105 0.9557912309
## 106 0.9550196620
## 107 0.9544097046
## 108 0.9269890941
## 109 0.9131957398
## 110 0.9116504528
## 111 0.9108608594
## 112 0.9086047316
## 113 0.9069761385
## 114 0.9001575614
## 115 0.8986634681
## 116 0.8968601375
## 117 0.8908434370
## 118 0.8855519902
## 119 0.8846981675
## 120 0.8827498831
## 121 0.8825695741
## 122 0.8747753123
## 123 0.8653251946
## 124 0.8632324362
## 125 0.8581773250
## 126 0.8545409580
## 127 0.8533743543
## 128 0.8393192948
## 129 0.8391929117
## 130 0.8347399125
## 131 0.8314669592
## 132 0.8312736284
## 133 0.8172906823
## 134 0.7993815658
## 135 0.7937840365
## 136 0.7883657194
## 137 0.7871256031
## 138 0.7848653578
## 139 0.7786991446
## 140 0.7786461624
## 141 0.7737502530
## 142 0.7672106792
## 143 0.7632644859
## 144 0.7441510436
## 145 0.7440367310
## 146 0.7364803865
## 147 0.7360591250
## 148 0.7301711362
## 149 0.7280476458
## 150 0.7271680131
## 151 0.7215871158
## 152 0.7197250173
## 153 0.7193798044
## 154 0.7176925652
## 155 0.7142971967
## 156 0.7129510358
## 157 0.7064570848
## 158 0.7044553824
## 159 0.7006851009
## 160 0.6838965081
## 161 0.6831538301
## 162 0.6800718662
## 163 0.6740752659
## 164 0.6718556970
## 165 0.6655878692
## 166 0.6623312746
## 167 0.6561722674
## 168 0.6550849265
## 169 0.6485590298
## 170 0.6465178704
## 171 0.6211627502
## 172 0.6003644103
## 173 0.5930007418
## 174 0.5888922852
## 175 0.5850205392
## 176 0.5848837883
## 177 0.5845082463
## 178 0.5705718085
## 179 0.5694184525
## 180 0.5638098771
## 181 0.5629765828
## 182 0.5629306614
## 183 0.5590294375
## 184 0.5464198734
## 185 0.5456524314
## 186 0.5364480627
## 187 0.5335313440
## 188 0.5316117349
## 189 0.5274604810
## 190 0.5230850366
## 191 0.5147447556
## 192 0.5095621097
## 193 0.5071580871
## 194 0.5059030380
## 195 0.5039000314
## 196 0.5013719062
## 197 0.4923538867
## 198 0.4889722398
## 199 0.4832471713
## 200 0.4696999290
## 201 0.4684678748
## 202 0.4642620587
## 203 0.4569421316
## 204 0.4531369816
## 205 0.4465194888
## 206 0.4422982180
## 207 0.4363950891
## 208 0.4303367558
## 209 0.4284931401
## 210 0.4256966608
## 211 0.4197933399
## 212 0.4184346906
## 213 0.4164925474
## 214 0.4094308388
## 215 0.4052559443
## 216 0.3893684494
## 217 0.3861434839
## 218 0.3856893974
## 219 0.3774850130
## 220 0.3750566144
## 221 0.3742652727
## 222 0.3734908810
## 223 0.3691358927
## 224 0.3623912412
## 225 0.3618837088
## 226 0.3531629450
## 227 0.3529273168
## 228 0.3505452734
## 229 0.3473943333
## 230 0.3424878320
## 231 0.3403703483
## 232 0.3339766291
## 233 0.3151719140
## 234 0.3083496649
## 235 0.2978887768
## 236 0.2904924550
## 237 0.2845648526
## 238 0.2784406157
## 239 0.2712037965
## 240 0.2607408521
## 241 0.2432629293
## 242 0.2428965826
## 243 0.2385024214
## 244 0.2347880532
## 245 0.2302967139
## 246 0.2300878603
## 247 0.2272961800
## 248 0.2223057953
## 249 0.2188607533
## 250 0.2127681451
## 251 0.2051735051
## 252 0.2018609828
## 253 0.2013771218
## 254 0.1941175318
## 255 0.1884090076
## 256 0.1880759570
## 257 0.1830846212
## 258 0.1786373959
## 259 0.1727811391
## 260 0.1712356410
## 261 0.1684060512
## 262 0.1651744715
## 263 0.1616967011
## 264 0.1614125989
## 265 0.1603872445
## 266 0.1548093335
## 267 0.1538487606
## 268 0.1536550362
## 269 0.1531453976
## 270 0.1233597101
## 271 0.1232593979
## 272 0.1218567666
## 273 0.1142474042
## 274 0.1026896841
## 275 0.1004384013
## 276 0.0958034936
## 277 0.0948527159
## 278 0.0945471316
## 279 0.0909989563
## 280 0.0881117092
## 281 0.0847009778
## 282 0.0601169626
## 283 0.0537185371
## 284 0.0530760101
## 285 0.0493659558
## 286 0.0457553577
## 287 0.0407242350
## 288 0.0147031699
## 289 0.0026719672
## 290 0.0017134884
## 291 0.0003150912
## 292 0.0000000000
## 293 0.0000000000
## 294 0.0000000000
## 295 0.0000000000
## 296 0.0000000000
## 297 0.0000000000
## 298 0.0000000000
## 299 0.0000000000
## 300 0.0000000000
## 301 0.0000000000
## 302 0.0000000000
## 303 0.0000000000
## 304 0.0000000000
## 305 0.0000000000
## 306 0.0000000000
## 307 0.0000000000
## 308 0.0000000000
## 309 0.0000000000
## 310 0.0000000000
## 311 0.0000000000
## 312 0.0000000000
## 313 0.0000000000
## 314 0.0000000000
## 315 0.0000000000
## 316 0.0000000000
## 317 0.0000000000
## 318 0.0000000000
## 319 0.0000000000
## 320 0.0000000000
## 321 0.0000000000
## 322 0.0000000000
## 323 0.0000000000
## 324 0.0000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.9135
## [1] "The auc value is:"
## Area under the curve: 0.9135
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
modelTrain_LRM2_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_LRM2_AUC <-mean_auc
}
print(modelTrain_LRM2_AUC)
## Area under the curve: 0.9135
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 389 samples
## 324 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.8225108 0.60397605
## 0 0.05357895 0.8225108 0.60397605
## 0 0.10615789 0.8225108 0.60397605
## 0 0.15873684 0.8225108 0.60397605
## 0 0.21131579 0.8225108 0.60397605
## 0 0.26389474 0.8225108 0.60397605
## 0 0.31647368 0.8225108 0.60397605
## 0 0.36905263 0.8225108 0.60397605
## 0 0.42163158 0.8225108 0.60397605
## 0 0.47421053 0.8225108 0.60397605
## 0 0.52678947 0.8225108 0.60397605
## 0 0.57936842 0.8225108 0.60397605
## 0 0.63194737 0.8225108 0.60397605
## 0 0.68452632 0.8225108 0.60397605
## 0 0.73710526 0.8225108 0.60397605
## 0 0.78968421 0.8225108 0.60397605
## 0 0.84226316 0.8225108 0.60397605
## 0 0.89484211 0.8225108 0.60397605
## 0 0.94742105 0.8225108 0.60397605
## 0 1.00000000 0.8225108 0.60397605
## 1 0.00100000 0.7583750 0.48522162
## 1 0.05357895 0.6092241 0.09287984
## 1 0.10615789 0.6015318 0.00000000
## 1 0.15873684 0.6015318 0.00000000
## 1 0.21131579 0.6015318 0.00000000
## 1 0.26389474 0.6015318 0.00000000
## 1 0.31647368 0.6015318 0.00000000
## 1 0.36905263 0.6015318 0.00000000
## 1 0.42163158 0.6015318 0.00000000
## 1 0.47421053 0.6015318 0.00000000
## 1 0.52678947 0.6015318 0.00000000
## 1 0.57936842 0.6015318 0.00000000
## 1 0.63194737 0.6015318 0.00000000
## 1 0.68452632 0.6015318 0.00000000
## 1 0.73710526 0.6015318 0.00000000
## 1 0.78968421 0.6015318 0.00000000
## 1 0.84226316 0.6015318 0.00000000
## 1 0.89484211 0.6015318 0.00000000
## 1 0.94742105 0.6015318 0.00000000
## 1 1.00000000 0.6015318 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 1.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.7161347
modelTrain_mean_accuracy_cv_ENM1 <- mean_accuracy_elastic_net_model1
print(modelTrain_mean_accuracy_cv_ENM1)
## [1] 0.7161347
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
modelTrain_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.93573264781491"
print(modelTrain_ENM1_trainAccuracy)
## [1] 0.9357326
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_modelTrain_ENM1<- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_modelTrain_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 44 4
## MCI 22 95
##
## Accuracy : 0.8424
## 95% CI : (0.7777, 0.8944)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.263e-11
##
## Kappa : 0.6561
##
## Mcnemar's Test P-Value : 0.0008561
##
## Sensitivity : 0.6667
## Specificity : 0.9596
## Pos Pred Value : 0.9167
## Neg Pred Value : 0.8120
## Prevalence : 0.4000
## Detection Rate : 0.2667
## Detection Prevalence : 0.2909
## Balanced Accuracy : 0.8131
##
## 'Positive' Class : CN
##
cm_modelTrain_ENM1_Accuracy <- cm_modelTrain_ENM1$overall["Accuracy"]
print(cm_modelTrain_ENM1_Accuracy)
## Accuracy
## 0.8424242
cm_modelTrain_ENM1_Kappa <- cm_modelTrain_ENM1$overall["Kappa"]
print(cm_modelTrain_ENM1_Kappa)
## Kappa
## 0.6560847
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 324)
##
## Overall
## PC2 100.00
## cg23432430 63.97
## cg16652920 61.00
## cg07028768 60.37
## cg20685672 58.96
## cg02981548 57.95
## cg00962106 56.94
## cg03924089 56.80
## cg27272246 56.59
## cg09015880 55.91
## cg00086247 55.83
## cg06833284 54.89
## cg14710850 54.22
## cg24433124 54.17
## cg13405878 53.34
## cg12543766 53.03
## cg17129965 52.66
## cg06634367 51.12
## cg14687298 49.76
## cg07480955 49.32
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 ||METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 0.352038526
## 2 0.226049944
## 3 0.215660486
## 4 0.213465184
## 5 0.208521737
## 6 0.205001650
## 7 0.201469337
## 8 0.200991170
## 9 0.200252997
## 10 0.197863248
## 11 0.197594171
## 12 0.194309854
## 13 0.191952991
## 14 0.191789642
## 15 0.188893466
## 16 0.187780682
## 17 0.186493681
## 18 0.181098320
## 19 0.176345054
## 20 0.174822149
## 21 0.174317708
## 22 0.172998874
## 23 0.168863168
## 24 0.168392557
## 25 0.166703694
## 26 0.166177679
## 27 0.165684349
## 28 0.163163253
## 29 0.162621183
## 30 0.161458823
## 31 0.161380617
## 32 0.159472628
## 33 0.158175879
## 34 0.157844295
## 35 0.157731684
## 36 0.156941232
## 37 0.156185265
## 38 0.153705570
## 39 0.150345052
## 40 0.150075256
## 41 0.147481627
## 42 0.144846852
## 43 0.144327855
## 44 0.144306580
## 45 0.143273143
## 46 0.141595900
## 47 0.141414852
## 48 0.141249431
## 49 0.141164668
## 50 0.139754495
## 51 0.138940911
## 52 0.137485919
## 53 0.136758247
## 54 0.136659735
## 55 0.135549695
## 56 0.134868527
## 57 0.134690369
## 58 0.134548666
## 59 0.134355110
## 60 0.132116365
## 61 0.131788706
## 62 0.131455601
## 63 0.131253461
## 64 0.131205854
## 65 0.131068427
## 66 0.130720352
## 67 0.129994614
## 68 0.129714652
## 69 0.127572502
## 70 0.126742573
## 71 0.125908215
## 72 0.125903627
## 73 0.123736674
## 74 0.123133732
## 75 0.122094612
## 76 0.121832100
## 77 0.121486018
## 78 0.121410844
## 79 0.121380851
## 80 0.121148623
## 81 0.120806215
## 82 0.120768826
## 83 0.120246800
## 84 0.119949939
## 85 0.119809193
## 86 0.119542845
## 87 0.119502021
## 88 0.119321825
## 89 0.118744699
## 90 0.117775884
## 91 0.117727618
## 92 0.117596217
## 93 0.117054103
## 94 0.116109869
## 95 0.114901645
## 96 0.114653305
## 97 0.114642045
## 98 0.114473579
## 99 0.113305899
## 100 0.111952447
## 101 0.111769693
## 102 0.111132946
## 103 0.111061324
## 104 0.110823911
## 105 0.110368358
## 106 0.109312315
## 107 0.109272818
## 108 0.108962424
## 109 0.108820012
## 110 0.108672033
## 111 0.108204766
## 112 0.108089836
## 113 0.107474213
## 114 0.107444020
## 115 0.107227828
## 116 0.107182717
## 117 0.107179856
## 118 0.107050710
## 119 0.106436861
## 120 0.105299386
## 121 0.103781323
## 122 0.103034914
## 123 0.102811899
## 124 0.102244126
## 125 0.101692546
## 126 0.101393545
## 127 0.101323546
## 128 0.101185711
## 129 0.101151885
## 130 0.101048808
## 131 0.101023211
## 132 0.100910392
## 133 0.100511324
## 134 0.100016027
## 135 0.099766014
## 136 0.099414962
## 137 0.099337710
## 138 0.098630226
## 139 0.098627972
## 140 0.097544049
## 141 0.097443753
## 142 0.097430915
## 143 0.097206578
## 144 0.097143491
## 145 0.096703350
## 146 0.096473622
## 147 0.096038139
## 148 0.095420096
## 149 0.094871564
## 150 0.094462213
## 151 0.093809111
## 152 0.093547502
## 153 0.093189782
## 154 0.092652504
## 155 0.092203316
## 156 0.091704099
## 157 0.091214815
## 158 0.090272894
## 159 0.090128212
## 160 0.089671700
## 161 0.089347726
## 162 0.088721501
## 163 0.088512077
## 164 0.088418191
## 165 0.088308315
## 166 0.088042968
## 167 0.087711775
## 168 0.087616369
## 169 0.086809953
## 170 0.086237029
## 171 0.085850527
## 172 0.085818164
## 173 0.085424374
## 174 0.084803884
## 175 0.084721767
## 176 0.084308353
## 177 0.083695598
## 178 0.083648520
## 179 0.083630979
## 180 0.083628658
## 181 0.083476947
## 182 0.083348237
## 183 0.083342504
## 184 0.083029147
## 185 0.082662390
## 186 0.082128594
## 187 0.081735969
## 188 0.081580001
## 189 0.081549344
## 190 0.081206912
## 191 0.081157043
## 192 0.080502032
## 193 0.079434870
## 194 0.079183881
## 195 0.078836200
## 196 0.078106109
## 197 0.078095264
## 198 0.077382005
## 199 0.077198363
## 200 0.076735657
## 201 0.075586718
## 202 0.075381870
## 203 0.074219842
## 204 0.074092588
## 205 0.073820719
## 206 0.073645990
## 207 0.073396951
## 208 0.073174007
## 209 0.072609151
## 210 0.072464769
## 211 0.072451069
## 212 0.072004922
## 213 0.071965396
## 214 0.071836507
## 215 0.071753611
## 216 0.071609456
## 217 0.071488273
## 218 0.071410757
## 219 0.070940456
## 220 0.070657859
## 221 0.070458884
## 222 0.070342705
## 223 0.070130304
## 224 0.069876018
## 225 0.069680824
## 226 0.069602300
## 227 0.068942838
## 228 0.067988209
## 229 0.067882176
## 230 0.067772829
## 231 0.067682910
## 232 0.066703382
## 233 0.065137498
## 234 0.065133113
## 235 0.065077569
## 236 0.065071390
## 237 0.064638152
## 238 0.064494786
## 239 0.064367006
## 240 0.064188736
## 241 0.063857683
## 242 0.063804475
## 243 0.063534062
## 244 0.063155363
## 245 0.063135532
## 246 0.062873357
## 247 0.062173773
## 248 0.061886329
## 249 0.061664495
## 250 0.061353044
## 251 0.060882578
## 252 0.060762536
## 253 0.060660062
## 254 0.060353126
## 255 0.059520613
## 256 0.059082264
## 257 0.058984901
## 258 0.058574471
## 259 0.057888785
## 260 0.057782382
## 261 0.057757542
## 262 0.057649414
## 263 0.057620449
## 264 0.057360841
## 265 0.056885068
## 266 0.056773628
## 267 0.056745623
## 268 0.056637183
## 269 0.056498688
## 270 0.055886080
## 271 0.055385316
## 272 0.055385029
## 273 0.054904286
## 274 0.054592243
## 275 0.053847946
## 276 0.053489650
## 277 0.052977170
## 278 0.052141445
## 279 0.051720108
## 280 0.051377682
## 281 0.049313799
## 282 0.049082160
## 283 0.049009955
## 284 0.048656243
## 285 0.048009290
## 286 0.047833523
## 287 0.047683756
## 288 0.046708934
## 289 0.046320423
## 290 0.046243236
## 291 0.045556361
## 292 0.044702123
## 293 0.043922180
## 294 0.043760708
## 295 0.043717770
## 296 0.043181425
## 297 0.042997716
## 298 0.041814291
## 299 0.041653604
## 300 0.040369265
## 301 0.039626579
## 302 0.038396244
## 303 0.037088688
## 304 0.036795455
## 305 0.034861791
## 306 0.034339185
## 307 0.033437962
## 308 0.030772361
## 309 0.029266101
## 310 0.027824898
## 311 0.027179700
## 312 0.023653814
## 313 0.021945810
## 314 0.018922788
## 315 0.018142089
## 316 0.015272259
## 317 0.011120238
## 318 0.010708644
## 319 0.009401867
## 320 0.007728457
## 321 0.007446366
## 322 0.004981792
## 323 0.003729328
## 324 0.002358855
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.9474
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG ==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
modelTrain_ENM1_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_ENM1_AUC <-mean_auc
}
print(modelTrain_ENM1_AUC)
## Area under the curve: 0.9474
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
# Start point of parallel processing
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 389 samples
## 324 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.5988012 0.14149522
## 0.3 1 0.6 0.50 100 0.6475191 0.24237930
## 0.3 1 0.6 0.50 150 0.6887113 0.33501663
## 0.3 1 0.6 0.75 50 0.5784216 0.09486944
## 0.3 1 0.6 0.75 100 0.6064935 0.16157725
## 0.3 1 0.6 0.75 150 0.6349983 0.21954732
## 0.3 1 0.6 1.00 50 0.5861139 0.09908243
## 0.3 1 0.6 1.00 100 0.5937396 0.13275580
## 0.3 1 0.6 1.00 150 0.6271395 0.20316277
## 0.3 1 0.8 0.50 50 0.6298368 0.20856640
## 0.3 1 0.8 0.50 100 0.6119547 0.18005954
## 0.3 1 0.8 0.50 150 0.6428571 0.24708738
## 0.3 1 0.8 0.75 50 0.6143856 0.14864524
## 0.3 1 0.8 0.75 100 0.6040959 0.16615806
## 0.3 1 0.8 0.75 150 0.6324675 0.21620403
## 0.3 1 0.8 1.00 50 0.6117882 0.14828797
## 0.3 1 0.8 1.00 100 0.6118548 0.16244266
## 0.3 1 0.8 1.00 150 0.6144522 0.17721361
## 0.3 2 0.6 0.50 50 0.6581086 0.25849105
## 0.3 2 0.6 0.50 100 0.6631369 0.28262994
## 0.3 2 0.6 0.50 150 0.6888445 0.33583732
## 0.3 2 0.6 0.75 50 0.6607060 0.26372044
## 0.3 2 0.6 0.75 100 0.6708958 0.28545959
## 0.3 2 0.6 0.75 150 0.6605728 0.26580719
## 0.3 2 0.6 1.00 50 0.6064602 0.13914792
## 0.3 2 0.6 1.00 100 0.6219447 0.17187858
## 0.3 2 0.6 1.00 150 0.6347985 0.20204263
## 0.3 2 0.8 0.50 50 0.6400266 0.21877830
## 0.3 2 0.8 0.50 100 0.6604396 0.25549173
## 0.3 2 0.8 0.50 150 0.6836830 0.30544284
## 0.3 2 0.8 0.75 50 0.6169497 0.17303705
## 0.3 2 0.8 0.75 100 0.6451215 0.23765343
## 0.3 2 0.8 0.75 150 0.6580420 0.26339036
## 0.3 2 0.8 1.00 50 0.6091242 0.15139762
## 0.3 2 0.8 1.00 100 0.6425907 0.22674500
## 0.3 2 0.8 1.00 150 0.6451548 0.23022341
## 0.3 3 0.6 0.50 50 0.6863470 0.31582588
## 0.3 3 0.6 0.50 100 0.6940726 0.34137825
## 0.3 3 0.6 0.50 150 0.6890110 0.33043945
## 0.3 3 0.6 0.75 50 0.6582085 0.24594926
## 0.3 3 0.6 0.75 100 0.6684982 0.26851103
## 0.3 3 0.6 0.75 150 0.6787546 0.28984955
## 0.3 3 0.6 1.00 50 0.6297369 0.18119501
## 0.3 3 0.6 1.00 100 0.6605395 0.25842187
## 0.3 3 0.6 1.00 150 0.6682651 0.27277371
## 0.3 3 0.8 0.50 50 0.6579754 0.25375482
## 0.3 3 0.8 0.50 100 0.6785881 0.30141637
## 0.3 3 0.8 0.50 150 0.6760573 0.29606343
## 0.3 3 0.8 0.75 50 0.6451881 0.22239219
## 0.3 3 0.8 0.75 100 0.6555112 0.24953966
## 0.3 3 0.8 0.75 150 0.6682651 0.27874043
## 0.3 3 0.8 1.00 50 0.6245088 0.16927487
## 0.3 3 0.8 1.00 100 0.6553447 0.23852372
## 0.3 3 0.8 1.00 150 0.6502498 0.23175676
## 0.4 1 0.6 0.50 50 0.5989344 0.14708297
## 0.4 1 0.6 0.50 100 0.6710956 0.29903791
## 0.4 1 0.6 0.50 150 0.6837496 0.32505847
## 0.4 1 0.6 0.75 50 0.5860473 0.11223790
## 0.4 1 0.6 0.75 100 0.6298701 0.21675793
## 0.4 1 0.6 0.75 150 0.6606394 0.27161864
## 0.4 1 0.6 1.00 50 0.5782551 0.07994362
## 0.4 1 0.6 1.00 100 0.6118215 0.17216804
## 0.4 1 0.6 1.00 150 0.6195138 0.18171375
## 0.4 1 0.8 0.50 50 0.6040626 0.15491262
## 0.4 1 0.8 0.50 100 0.6684982 0.29393678
## 0.4 1 0.8 0.50 150 0.6889111 0.34321981
## 0.4 1 0.8 0.75 50 0.5732268 0.06833334
## 0.4 1 0.8 0.75 100 0.6272394 0.19356483
## 0.4 1 0.8 0.75 150 0.6374958 0.21412150
## 0.4 1 0.8 1.00 50 0.6014652 0.13988197
## 0.4 1 0.8 1.00 100 0.6090909 0.16718926
## 0.4 1 0.8 1.00 150 0.6399267 0.23898774
## 0.4 2 0.6 0.50 50 0.6685315 0.28515161
## 0.4 2 0.6 0.50 100 0.6581419 0.26305995
## 0.4 2 0.6 0.50 150 0.6658342 0.27777179
## 0.4 2 0.6 0.75 50 0.6324009 0.20891687
## 0.4 2 0.6 0.75 100 0.6504829 0.25066250
## 0.4 2 0.6 0.75 150 0.6684316 0.28967404
## 0.4 2 0.6 1.00 50 0.6322344 0.20771492
## 0.4 2 0.6 1.00 100 0.6606061 0.27024673
## 0.4 2 0.6 1.00 150 0.6734266 0.29355387
## 0.4 2 0.8 0.50 50 0.6425241 0.23224359
## 0.4 2 0.8 0.50 100 0.6527473 0.26111221
## 0.4 2 0.8 0.50 150 0.6809857 0.31840033
## 0.4 2 0.8 0.75 50 0.6272727 0.19145001
## 0.4 2 0.8 0.75 100 0.6605728 0.26250074
## 0.4 2 0.8 0.75 150 0.6580087 0.26483731
## 0.4 2 0.8 1.00 50 0.6500500 0.23397060
## 0.4 2 0.8 1.00 100 0.6603730 0.26610861
## 0.4 2 0.8 1.00 150 0.6758242 0.29983554
## 0.4 3 0.6 0.50 50 0.6529138 0.23533815
## 0.4 3 0.6 0.50 100 0.6658009 0.26299777
## 0.4 3 0.6 0.50 150 0.6862471 0.30780758
## 0.4 3 0.6 0.75 50 0.6554779 0.26288497
## 0.4 3 0.6 0.75 100 0.6606061 0.27459833
## 0.4 3 0.6 0.75 150 0.6553780 0.26035328
## 0.4 3 0.6 1.00 50 0.6219780 0.18208017
## 0.4 3 0.6 1.00 100 0.6477189 0.23558051
## 0.4 3 0.6 1.00 150 0.6606061 0.26764730
## 0.4 3 0.8 0.50 50 0.6194472 0.16432624
## 0.4 3 0.8 0.50 100 0.6503164 0.23417327
## 0.4 3 0.8 0.50 150 0.6477855 0.23096871
## 0.4 3 0.8 0.75 50 0.6478188 0.23604606
## 0.4 3 0.8 0.75 100 0.6632368 0.26695052
## 0.4 3 0.8 0.75 150 0.6862471 0.31531091
## 0.4 3 0.8 1.00 50 0.6374292 0.20344943
## 0.4 3 0.8 1.00 100 0.6348318 0.19990138
## 0.4 3 0.8 1.00 150 0.6245754 0.17640121
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6442043
modelTrain_mean_accuracy_cv_xgb <- mean_accuracy_xgb_model
print(modelTrain_mean_accuracy_cv_xgb)
## [1] 0.6442043
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
modelTrain_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", modelTrain_xgb_trainAccuracy))
## [1] "Training Accuracy: 1"
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_modelTrain_xgb <- caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_modelTrain_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 35 20
## MCI 31 79
##
## Accuracy : 0.6909
## 95% CI : (0.6144, 0.7604)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.009814
##
## Kappa : 0.3377
##
## Mcnemar's Test P-Value : 0.161429
##
## Sensitivity : 0.5303
## Specificity : 0.7980
## Pos Pred Value : 0.6364
## Neg Pred Value : 0.7182
## Prevalence : 0.4000
## Detection Rate : 0.2121
## Detection Prevalence : 0.3333
## Balanced Accuracy : 0.6641
##
## 'Positive' Class : CN
##
cm_modelTrain_xgb_Accuracy <- cm_modelTrain_xgb$overall["Accuracy"]
cm_modelTrain_xgb_Kappa <- cm_modelTrain_xgb$overall["Kappa"]
print(cm_modelTrain_xgb_Accuracy)
## Accuracy
## 0.6909091
print(cm_modelTrain_xgb_Kappa)
## Kappa
## 0.3376623
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 324)
##
## Overall
## cg22535849 100.00
## cg11331837 86.16
## cg12543766 83.52
## cg19799454 80.07
## age.now 76.97
## cg04412904 75.49
## cg06961873 66.15
## cg23836570 65.49
## cg20685672 57.96
## cg20078646 57.83
## cg10240127 53.35
## cg14687298 47.90
## cg04316537 47.09
## cg18285382 46.61
## cg08880261 46.34
## cg04718469 46.00
## PC1 45.86
## cg16652920 45.85
## cg27272246 44.87
## cg02621446 44.81
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: cg22535849 2.616066e-02 0.0223517413 0.008948546 2.616066e-02
## 2: cg11331837 2.253948e-02 0.0121370755 0.011185682 2.253948e-02
## 3: cg12543766 2.184880e-02 0.0164798095 0.013422819 2.184880e-02
## 4: cg19799454 2.094654e-02 0.0133953048 0.004474273 2.094654e-02
## 5: age.now 2.013668e-02 0.0173745021 0.013422819 2.013668e-02
## ---
## 222: cg05850457 1.348179e-04 0.0007069001 0.002237136 1.348179e-04
## 223: cg14240646 1.209097e-04 0.0007217576 0.002237136 1.209097e-04
## 224: cg03395511 7.811054e-05 0.0005659188 0.002237136 7.811054e-05
## 225: cg15586958 7.308457e-05 0.0006565421 0.002237136 7.308457e-05
## 226: cg13815695 1.634269e-05 0.0005030445 0.002237136 1.634269e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7334
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
modelTrain_xgb_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_xgb_AUC<-mean_auc
}
print(modelTrain_xgb_AUC)
## Area under the curve: 0.7334
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 389 samples
## 324 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6169497 0.04867402
## 163 0.6426573 0.14000206
## 324 0.6479187 0.15252993
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 324.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
modelTrain_mean_accuracy_cv_rf <- mean_accuracy_rf_model
print(modelTrain_mean_accuracy_cv_rf)
## [1] 0.6358419
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
modelTrain_rf_trainAccuracy <- train_accuracy
print(modelTrain_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_modelTrain_rf <- caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_modelTrain_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 13 2
## MCI 53 97
##
## Accuracy : 0.6667
## 95% CI : (0.5892, 0.738)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.04649
##
## Kappa : 0.2029
##
## Mcnemar's Test P-Value : 1.562e-11
##
## Sensitivity : 0.19697
## Specificity : 0.97980
## Pos Pred Value : 0.86667
## Neg Pred Value : 0.64667
## Prevalence : 0.40000
## Detection Rate : 0.07879
## Detection Prevalence : 0.09091
## Balanced Accuracy : 0.58838
##
## 'Positive' Class : CN
##
cm_modelTrain_rf_Accuracy <- cm_modelTrain_rf$overall["Accuracy"]
cm_modelTrain_rf_Kappa <- cm_modelTrain_rf$overall["Kappa"]
print(cm_modelTrain_rf_Accuracy)
## Accuracy
## 0.6666667
print(cm_modelTrain_rf_Kappa)
## Kappa
## 0.2028986
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 324)
##
## Importance
## age.now 100.00
## cg06286533 98.01
## cg03749159 83.58
## cg18526121 79.83
## cg24851651 78.38
## cg23658987 76.80
## cg26081710 76.56
## cg27086157 74.68
## cg08880261 74.44
## cg15501526 74.37
## cg10864200 73.82
## cg02464073 72.78
## cg09289202 72.26
## cg18819889 71.95
## cg14168080 70.64
## cg12228670 69.69
## cg27160885 69.36
## cg03979311 69.05
## cg04412904 68.81
## cg00004073 68.79
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
## CN MCI
## 1 3.19403610 3.19403610
## 2 3.08106011 3.08106011
## 3 2.26054419 2.26054419
## 4 2.04727189 2.04727189
## 5 1.96484449 1.96484449
## 6 1.87503122 1.87503122
## 7 1.86151587 1.86151587
## 8 1.75467775 1.75467775
## 9 1.74075253 1.74075253
## 10 1.73667238 1.73667238
## 11 1.70555686 1.70555686
## 12 1.64660479 1.64660479
## 13 1.61705167 1.61705167
## 14 1.59901643 1.59901643
## 15 1.52476918 1.52476918
## 16 1.47064643 1.47064643
## 17 1.45168912 1.45168912
## 18 1.43405450 1.43405450
## 19 1.42082656 1.42082656
## 20 1.41980817 1.41980817
## 21 1.41792119 1.41792119
## 22 1.40934480 1.40934480
## 23 1.38865871 1.38865871
## 24 1.38192497 1.38192497
## 25 1.37632165 1.37632165
## 26 1.35017111 1.35017111
## 27 1.34655532 1.34655532
## 28 1.29326102 1.29326102
## 29 1.25904236 1.25904236
## 30 1.23284110 1.23284110
## 31 1.22614907 1.22614907
## 32 1.21293277 1.21293277
## 33 1.20663344 1.20663344
## 34 1.20597313 1.20597313
## 35 1.17730794 1.17730794
## 36 1.17337166 1.17337166
## 37 1.15918941 1.15918941
## 38 1.12637725 1.12637725
## 39 1.12303109 1.12303109
## 40 1.08362963 1.08362963
## 41 1.07590166 1.07590166
## 42 1.07101150 1.07101150
## 43 1.05433413 1.05433413
## 44 1.04511980 1.04511980
## 45 1.03888881 1.03888881
## 46 1.03601414 1.03601414
## 47 1.02789446 1.02789446
## 48 1.02106719 1.02106719
## 49 1.01848937 1.01848937
## 50 1.01246021 1.01246021
## 51 0.98180708 0.98180708
## 52 0.95077447 0.95077447
## 53 0.93941528 0.93941528
## 54 0.93640316 0.93640316
## 55 0.93532949 0.93532949
## 56 0.91749769 0.91749769
## 57 0.90194464 0.90194464
## 58 0.89616662 0.89616662
## 59 0.89220433 0.89220433
## 60 0.87935084 0.87935084
## 61 0.87754982 0.87754982
## 62 0.86990503 0.86990503
## 63 0.86944442 0.86944442
## 64 0.86296392 0.86296392
## 65 0.84458090 0.84458090
## 66 0.84154153 0.84154153
## 67 0.82098511 0.82098511
## 68 0.81334066 0.81334066
## 69 0.80939063 0.80939063
## 70 0.80706538 0.80706538
## 71 0.80058836 0.80058836
## 72 0.79696096 0.79696096
## 73 0.79624548 0.79624548
## 74 0.78714783 0.78714783
## 75 0.76120395 0.76120395
## 76 0.75859281 0.75859281
## 77 0.75070895 0.75070895
## 78 0.74748825 0.74748825
## 79 0.74687239 0.74687239
## 80 0.74061291 0.74061291
## 81 0.73470690 0.73470690
## 82 0.71276232 0.71276232
## 83 0.70700388 0.70700388
## 84 0.67259485 0.67259485
## 85 0.66817860 0.66817860
## 86 0.65649418 0.65649418
## 87 0.62725930 0.62725930
## 88 0.61968961 0.61968961
## 89 0.61737412 0.61737412
## 90 0.61144051 0.61144051
## 91 0.60804776 0.60804776
## 92 0.60068274 0.60068274
## 93 0.59504931 0.59504931
## 94 0.58515633 0.58515633
## 95 0.56695537 0.56695537
## 96 0.55647821 0.55647821
## 97 0.55078003 0.55078003
## 98 0.54974739 0.54974739
## 99 0.54369931 0.54369931
## 100 0.53557590 0.53557590
## 101 0.52419246 0.52419246
## 102 0.52031046 0.52031046
## 103 0.50409853 0.50409853
## 104 0.49110536 0.49110536
## 105 0.48607972 0.48607972
## 106 0.48384976 0.48384976
## 107 0.47766720 0.47766720
## 108 0.46810380 0.46810380
## 109 0.46621406 0.46621406
## 110 0.46493725 0.46493725
## 111 0.45360914 0.45360914
## 112 0.45198912 0.45198912
## 113 0.42939618 0.42939618
## 114 0.42533018 0.42533018
## 115 0.42125822 0.42125822
## 116 0.40585635 0.40585635
## 117 0.38866601 0.38866601
## 118 0.37187411 0.37187411
## 119 0.36789462 0.36789462
## 120 0.35860044 0.35860044
## 121 0.35628389 0.35628389
## 122 0.34412143 0.34412143
## 123 0.33337429 0.33337429
## 124 0.32990692 0.32990692
## 125 0.32803616 0.32803616
## 126 0.32173430 0.32173430
## 127 0.31212527 0.31212527
## 128 0.31004838 0.31004838
## 129 0.30877229 0.30877229
## 130 0.30467823 0.30467823
## 131 0.30152494 0.30152494
## 132 0.30002885 0.30002885
## 133 0.29383917 0.29383917
## 134 0.29025286 0.29025286
## 135 0.28791259 0.28791259
## 136 0.28704761 0.28704761
## 137 0.28581780 0.28581780
## 138 0.28076119 0.28076119
## 139 0.27891238 0.27891238
## 140 0.27507589 0.27507589
## 141 0.27495905 0.27495905
## 142 0.27224620 0.27224620
## 143 0.27108478 0.27108478
## 144 0.25548584 0.25548584
## 145 0.25536962 0.25536962
## 146 0.24720368 0.24720368
## 147 0.24604871 0.24604871
## 148 0.23456740 0.23456740
## 149 0.23069618 0.23069618
## 150 0.22278307 0.22278307
## 151 0.21062798 0.21062798
## 152 0.19687636 0.19687636
## 153 0.18469868 0.18469868
## 154 0.18076940 0.18076940
## 155 0.17626980 0.17626980
## 156 0.15672936 0.15672936
## 157 0.15490967 0.15490967
## 158 0.15237370 0.15237370
## 159 0.13655136 0.13655136
## 160 0.13021736 0.13021736
## 161 0.13016541 0.13016541
## 162 0.12471882 0.12471882
## 163 0.10914452 0.10914452
## 164 0.09600741 0.09600741
## 165 0.09363568 0.09363568
## 166 0.08671530 0.08671530
## 167 0.08246422 0.08246422
## 168 0.07653364 0.07653364
## 169 0.07024218 0.07024218
## 170 0.06000634 0.06000634
## 171 0.05773370 0.05773370
## 172 0.05623830 0.05623830
## 173 0.05230527 0.05230527
## 174 0.05092059 0.05092059
## 175 0.05037456 0.05037456
## 176 0.03887851 0.03887851
## 177 0.01833996 0.01833996
## 178 -0.01468118 -0.01468118
## 179 -0.02530460 -0.02530460
## 180 -0.03597296 -0.03597296
## 181 -0.04380828 -0.04380828
## 182 -0.04833005 -0.04833005
## 183 -0.05277489 -0.05277489
## 184 -0.05529036 -0.05529036
## 185 -0.06648847 -0.06648847
## 186 -0.07458331 -0.07458331
## 187 -0.09134432 -0.09134432
## 188 -0.10426038 -0.10426038
## 189 -0.10529439 -0.10529439
## 190 -0.13100332 -0.13100332
## 191 -0.13833214 -0.13833214
## 192 -0.14482842 -0.14482842
## 193 -0.15882934 -0.15882934
## 194 -0.16476403 -0.16476403
## 195 -0.17453251 -0.17453251
## 196 -0.17578220 -0.17578220
## 197 -0.17734120 -0.17734120
## 198 -0.19675869 -0.19675869
## 199 -0.20725342 -0.20725342
## 200 -0.21973123 -0.21973123
## 201 -0.24535807 -0.24535807
## 202 -0.26344749 -0.26344749
## 203 -0.27437596 -0.27437596
## 204 -0.27556143 -0.27556143
## 205 -0.27745162 -0.27745162
## 206 -0.27943714 -0.27943714
## 207 -0.32030432 -0.32030432
## 208 -0.32623513 -0.32623513
## 209 -0.33171653 -0.33171653
## 210 -0.33304701 -0.33304701
## 211 -0.33519994 -0.33519994
## 212 -0.33986037 -0.33986037
## 213 -0.34278946 -0.34278946
## 214 -0.34315575 -0.34315575
## 215 -0.34350014 -0.34350014
## 216 -0.35125414 -0.35125414
## 217 -0.35626402 -0.35626402
## 218 -0.35940923 -0.35940923
## 219 -0.35960117 -0.35960117
## 220 -0.36213886 -0.36213886
## 221 -0.37663135 -0.37663135
## 222 -0.38976110 -0.38976110
## 223 -0.39181853 -0.39181853
## 224 -0.39339411 -0.39339411
## 225 -0.39458783 -0.39458783
## 226 -0.39468311 -0.39468311
## 227 -0.40510840 -0.40510840
## 228 -0.41255324 -0.41255324
## 229 -0.43199543 -0.43199543
## 230 -0.46195820 -0.46195820
## 231 -0.49547054 -0.49547054
## 232 -0.49669823 -0.49669823
## 233 -0.50143539 -0.50143539
## 234 -0.50309569 -0.50309569
## 235 -0.51929730 -0.51929730
## 236 -0.52235501 -0.52235501
## 237 -0.52673462 -0.52673462
## 238 -0.52814360 -0.52814360
## 239 -0.53028428 -0.53028428
## 240 -0.53055741 -0.53055741
## 241 -0.53721918 -0.53721918
## 242 -0.54250195 -0.54250195
## 243 -0.54272253 -0.54272253
## 244 -0.54697417 -0.54697417
## 245 -0.54945178 -0.54945178
## 246 -0.55544707 -0.55544707
## 247 -0.55568596 -0.55568596
## 248 -0.55798126 -0.55798126
## 249 -0.57323803 -0.57323803
## 250 -0.57711610 -0.57711610
## 251 -0.58034142 -0.58034142
## 252 -0.58823407 -0.58823407
## 253 -0.58938295 -0.58938295
## 254 -0.59062932 -0.59062932
## 255 -0.59070022 -0.59070022
## 256 -0.59158597 -0.59158597
## 257 -0.59385368 -0.59385368
## 258 -0.60122234 -0.60122234
## 259 -0.60515144 -0.60515144
## 260 -0.61563133 -0.61563133
## 261 -0.62008424 -0.62008424
## 262 -0.62718332 -0.62718332
## 263 -0.63683952 -0.63683952
## 264 -0.64585093 -0.64585093
## 265 -0.64619998 -0.64619998
## 266 -0.65747679 -0.65747679
## 267 -0.67971702 -0.67971702
## 268 -0.68155096 -0.68155096
## 269 -0.68676427 -0.68676427
## 270 -0.70228144 -0.70228144
## 271 -0.70952215 -0.70952215
## 272 -0.71562130 -0.71562130
## 273 -0.72131994 -0.72131994
## 274 -0.74550410 -0.74550410
## 275 -0.75114918 -0.75114918
## 276 -0.77371268 -0.77371268
## 277 -0.77505391 -0.77505391
## 278 -0.79776458 -0.79776458
## 279 -0.79803427 -0.79803427
## 280 -0.81167603 -0.81167603
## 281 -0.82886741 -0.82886741
## 282 -0.83623129 -0.83623129
## 283 -0.84068514 -0.84068514
## 284 -0.85476949 -0.85476949
## 285 -0.86519543 -0.86519543
## 286 -0.86590606 -0.86590606
## 287 -0.86792841 -0.86792841
## 288 -0.87255742 -0.87255742
## 289 -0.87536347 -0.87536347
## 290 -0.87990297 -0.87990297
## 291 -0.89726773 -0.89726773
## 292 -0.90332202 -0.90332202
## 293 -0.90437516 -0.90437516
## 294 -0.91949548 -0.91949548
## 295 -0.92937366 -0.92937366
## 296 -0.93882091 -0.93882091
## 297 -0.96536663 -0.96536663
## 298 -0.96691090 -0.96691090
## 299 -0.96853809 -0.96853809
## 300 -0.99621870 -0.99621870
## 301 -0.99991839 -0.99991839
## 302 -1.03082911 -1.03082911
## 303 -1.04220353 -1.04220353
## 304 -1.04459753 -1.04459753
## 305 -1.05302150 -1.05302150
## 306 -1.05561545 -1.05561545
## 307 -1.12328654 -1.12328654
## 308 -1.13324809 -1.13324809
## 309 -1.16443489 -1.16443489
## 310 -1.20403683 -1.20403683
## 311 -1.21872702 -1.21872702
## 312 -1.24768963 -1.24768963
## 313 -1.25708606 -1.25708606
## 314 -1.28323884 -1.28323884
## 315 -1.29794631 -1.29794631
## 316 -1.32490234 -1.32490234
## 317 -1.50718408 -1.50718408
## 318 -1.52961689 -1.52961689
## 319 -1.55980655 -1.55980655
## 320 -1.57130115 -1.57130115
## 321 -1.58789497 -1.58789497
## 322 -1.58978659 -1.58978659
## 323 -1.83143932 -1.83143932
## 324 -2.49160754 -2.49160754
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7751
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
modelTrain_rf_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_rf_AUC <- mean_auc
}
print(modelTrain_rf_AUC)
## Area under the curve: 0.7751
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 389 samples
## 324 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 312, 311, 311, 311, 311
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8301698 0.6633146
## 0.50 0.8378954 0.6780763
## 1.00 0.8431235 0.6791570
##
## Tuning parameter 'sigma' was held constant at a value of 0.001579202
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.001579202 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.001579202 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8370629
modelTrain_mean_accuracy_cv_svm <- mean_accuracy_svm_model
print(modelTrain_mean_accuracy_cv_svm)
## [1] 0.8370629
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.989717223650386"
modelTrain_svm_trainAccuracy <-train_accuracy
print(modelTrain_svm_trainAccuracy)
## [1] 0.9897172
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_modelTrain_svm <- caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_modelTrain_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 51 6
## MCI 15 93
##
## Accuracy : 0.8727
## 95% CI : (0.8121, 0.9195)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.212e-14
##
## Kappa : 0.7287
##
## Mcnemar's Test P-Value : 0.08086
##
## Sensitivity : 0.7727
## Specificity : 0.9394
## Pos Pred Value : 0.8947
## Neg Pred Value : 0.8611
## Prevalence : 0.4000
## Detection Rate : 0.3091
## Detection Prevalence : 0.3455
## Balanced Accuracy : 0.8561
##
## 'Positive' Class : CN
##
cm_modelTrain_svm_Accuracy <- cm_modelTrain_svm$overall["Accuracy"]
cm_modelTrain_svm_Kappa <- cm_modelTrain_svm$overall["Kappa"]
print(cm_modelTrain_svm_Accuracy)
## Accuracy
## 0.8727273
print(cm_modelTrain_svm_Kappa)
## Kappa
## 0.7286822
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 554 rows and 325 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg05161773 1.048 1.16 1.200 0.05234657
## 2 cg11331837 1.008 1.12 1.152 0.05054152
## 3 cg26705599 0.976 1.12 1.120 0.05054152
## 4 cg12776173 1.048 1.12 1.184 0.05054152
## 5 age.now 1.040 1.08 1.112 0.04873646
## 6 cg05234269 1.048 1.08 1.112 0.04873646
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (test_data_SVM1$DX MCI) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9542
## [1] "The auc vlue is:"
## Area under the curve: 0.9542
if(METHOD_FEATURE_FLAG == 4|| METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
modelTrain_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
modelTrain_svm_AUC <- mean_auc
}
# GOTO "INPUT" Session to set the Number of common features needed
NUM_COMMON_FEATURES <- NUM_COMMON_FEATURES_SET
The feature importance may not combined directly, since they are not all within the same measure, for example, the SVM model is use other method for feature importance.
So, let’s considering scale the feature to make them in the same range.
First, Let’s process with each data frame to ensure they have consistent format.
if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
# Process the dataframe to ensure they have consistent format.
# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"
head(importance_SVM_df_processed)
# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
importance_model_LRM1_df_processed$Feature<-rownames(importance_model_LRM1_df_processed)
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "Overall"] <- "Importance_LRM1"
head(importance_model_LRM1_df_processed)
# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed$Feature<-rownames(importance_elastic_net_model1_df_processed)
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "Overall"] <- "Importance_ENM1"
head(importance_elastic_net_model1_df_processed)
# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"
head(importance_xgb_model_df_processed)
# RF
importance_rf_model_df_processed <- importance_rf_model_df
if (METHOD_FEATURE_FLAG_NUM == 3){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(CI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 4){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 5){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
if (METHOD_FEATURE_FLAG_NUM == 6){
importance_rf_model_df_processed$Importance <- rowMeans(importance_rf_model_df_processed)
importance_rf_model_df_processed$Feature <- rownames(importance_rf_model_df_processed)
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(MCI, Dementia))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "Importance"] <- "Importance_RF"
}
head(importance_rf_model_df_processed)
}
From above (binary case), we could ensure they have same data frame structure with same column names, ‘Importance’ and ‘feature’ in order.
If our case is the multiclass classification, see the below. Except XGBoost model and SVM model, the features importance of each model are computed by the max importance among the classes.
if(METHOD_FEATURE_FLAG == 1){
# Process the dataframe to ensure they have consistent format.
# SVM
importance_SVM_df_processed<-importance_SVM_df[,c("importance","feature")]
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "feature"] <- "Feature"
colnames(importance_SVM_df_processed)[colnames(importance_SVM_df_processed) == "importance"] <- "Importance_SVM"
head(importance_SVM_df_processed)
# LRM
importance_model_LRM1_df_processed<-importance_model_LRM1_df
colnames(importance_model_LRM1_df_processed)[colnames(importance_model_LRM1_df_processed) == "MaxImportance"] <- "Importance_LRM1"
importance_model_LRM1_df_processed <- subset(importance_model_LRM1_df_processed, select = -c(Dementia,MCI, CN))
head(importance_model_LRM1_df_processed)
# Elastic Net
importance_elastic_net_model1_df_processed<-importance_elastic_net_model1_df
importance_elastic_net_model1_df_processed <- subset(importance_elastic_net_model1_df_processed, select = -c(Dementia,MCI, CN))
colnames(importance_elastic_net_model1_df_processed)[colnames(importance_elastic_net_model1_df_processed) == "MaxImportance"] <- "Importance_ENM1"
head(importance_elastic_net_model1_df_processed)
# XGBoost
importance_xgb_model_df_processed<-importance_xgb_model_df
importance_xgb_model_df_processed$Feature<-rownames(importance_xgb_model_df_processed)
colnames(importance_xgb_model_df_processed)[colnames(importance_xgb_model_df_processed) == "Overall"] <- "Importance_XGB"
head(importance_xgb_model_df_processed)
# RF
importance_rf_model_df_processed <- importance_rf_model_df
importance_rf_model_df_processed <- subset(importance_rf_model_df_processed, select = -c(Dementia,MCI, CN))
colnames(importance_rf_model_df_processed)[colnames(importance_rf_model_df_processed) == "MaxImportance"] <- "Importance_RF"
head(importance_rf_model_df_processed)
}
Then, Let’s do scaling, here we choose min-max scaling.
importance_list <- list(logistic = importance_model_LRM1_df_processed,
xgb = importance_xgb_model_df_processed,
elastic_net = importance_elastic_net_model1_df_processed,
rf = importance_rf_model_df_processed,
svm = importance_SVM_df_processed)
min_max_scale_Imp<-function(df){
x<-df[, grepl("Importance_", colnames(df))]
df[, grepl("Importance_", colnames(df))] <- (x - min(x)) / (max(x) - min(x))
return(df)
}
for (i in seq_along(importance_list)) {
importance_list[[i]] <- min_max_scale_Imp(importance_list[[i]])
}
# Print each data frame after scaling
print(head(importance_list[[1]]))
## Importance_LRM1 Feature
## age.now 0.01043952 age.now
## PC1 0.06933081 PC1
## PC2 1.00000000 PC2
## PC3 0.00000000 PC3
## cg18993517 0.08391656 cg18993517
## cg13573375 0.11276353 cg13573375
print(head(importance_list[[2]]))
## Importance_XGB Feature
## cg22535849 1.0000000 cg22535849
## cg11331837 0.8615794 cg11331837
## cg12543766 0.8351779 cg12543766
## cg19799454 0.8006885 cg19799454
## age.now 0.7697314 age.now
## cg04412904 0.7548702 cg04412904
print(head(importance_list[[3]]))
## Importance_ENM1 Feature
## age.now 0.003919221 age.now
## PC1 0.192300124 PC1
## PC2 1.000000000 PC2
## PC3 0.023878393 PC3
## cg18993517 0.171056320 cg18993517
## cg13573375 0.179342809 cg13573375
print(head(importance_list[[4]]))
## Importance_RF Feature
## age.now 1.0000000 age.now
## PC1 0.4657937 PC1
## PC2 0.6034537 PC2
## PC3 0.3225898 PC3
## cg18993517 0.6538849 cg18993517
## cg13573375 0.3324839 cg13573375
print(head(importance_list[[5]]))
## Importance_SVM Feature
## 1 1.0000000 cg05161773
## 2 0.8571429 cg11331837
## 3 0.8571429 cg26705599
## 4 0.8571429 cg12776173
## 5 0.7142857 age.now
## 6 0.7142857 cg05234269
Now, Let’s merge the data frames of scaled feature importance.
# Merge all importances
combined_importance <- Reduce(function(x, y) merge(x, y, by = "Feature", all = TRUE), importance_list)
head(combined_importance)
# Replace NA with 0
combined_importance[is.na(combined_importance)] <- 0
# Exclude DX, as it's label
combined_importance <- combined_importance %>%
filter(Feature != "DX")
# View the filtered dataframe
head(combined_importance)
If select the TOP Number of important features based on average importance. (See the following)
combined_importance_AVF <- combined_importance
# Calculate average importance
combined_importance_AVF$Average_Importance <- rowMeans(combined_importance_AVF[,-1])
head(combined_importance_AVF)
combined_importance_Avg_ordered <- combined_importance_AVF[order(-combined_importance_AVF$Average_Importance),]
head(combined_importance_Avg_ordered)
# Top Number of common important features
print("the Top number of common features here is set to:")
## [1] "the Top number of common features here is set to:"
print(NUM_COMMON_FEATURES)
## [1] 20
top_Num_combined_importance_Avg_ordered <- head(combined_importance_Avg_ordered,n = NUM_COMMON_FEATURES)
print(top_Num_combined_importance_Avg_ordered)
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 320 cg27272246 0.62058225 0.44866850 0.565929788 0.6027351 0.7142857 0.5904403
## 323 PC2 1.00000000 0.00000000 1.000000000 0.6034537 0.2857143 0.5778336
## 172 cg12543766 0.48554307 0.83517789 0.530261958 0.4888664 0.4285714 0.5536841
## 256 cg20685672 0.51275876 0.57957571 0.589576401 0.6550619 0.4285714 0.5531088
## 198 cg14687298 0.48481061 0.47904262 0.497558803 0.5180094 0.7142857 0.5387414
## 157 cg11331837 0.16457562 0.86157944 0.308881275 0.4831631 0.8571429 0.5350685
## 217 cg16652920 0.49718690 0.45851096 0.609991512 0.5951502 0.4285714 0.5178822
## 183 cg14168080 0.43877415 0.29804410 0.398184557 0.7064067 0.7142857 0.5111390
## 1 age.now 0.01043952 0.76973141 0.003919221 1.0000000 0.7142857 0.4996752
## 56 cg04248279 0.35820583 0.44127057 0.491761079 0.5912281 0.5714286 0.4907788
## 289 cg24433124 0.48908534 0.21617286 0.541726622 0.4774113 0.7142857 0.4877364
## 58 cg04412904 0.14174716 0.75487022 0.415016324 0.6881251 0.4285714 0.4856660
## 102 cg07028768 0.36540268 0.41359475 0.603713475 0.6109097 0.4285714 0.4844384
## 199 cg14710850 0.56883983 0.00000000 0.542193761 0.5411461 0.7142857 0.4732931
## 123 cg08861434 0.41494827 0.38600385 0.368078508 0.5557482 0.5714286 0.4592415
## 97 cg06833284 0.47253884 0.10535920 0.548933823 0.5911471 0.5714286 0.4578815
## 52 cg03924089 0.50772161 0.08746406 0.568040785 0.6756981 0.4285714 0.4534992
## 253 cg20398163 0.38499658 0.36292236 0.371075361 0.5702638 0.5714286 0.4521373
## 2 cg00004073 0.55107624 0.00000000 0.444329028 0.6879460 0.5714286 0.4509560
## 16 cg00962106 0.47211353 0.44719122 0.569408229 0.3343471 0.4285714 0.4503263
# Top Number of common important features' name
top_Num_combined_importance_Avg_ordered_Nam <- top_Num_combined_importance_Avg_ordered$Feature
print(top_Num_combined_importance_Avg_ordered_Nam)
## [1] "cg27272246" "PC2" "cg12543766" "cg20685672" "cg14687298" "cg11331837" "cg16652920" "cg14168080" "age.now" "cg04248279" "cg24433124" "cg04412904" "cg07028768" "cg14710850" "cg08861434"
## [16] "cg06833284" "cg03924089" "cg20398163" "cg00004073" "cg00962106"
Visualization with bar plot for the feature average importance
ggplot(combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip coordinates to make it horizontal
labs(title = "Feature Importance Sorted by Average Value",
x = "Feature",
y = "Average Importance") +
theme_minimal()
Visualization with bar plot for the top feature average importance
ggplot(top_Num_combined_importance_Avg_ordered, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Average Value"),
x = "Feature",
y = "Average Importance") +
theme_minimal()
The following will show, If we select the TOP Number of important features based on specific quantile importance. ( Here we choose to use median i.e 50% quantile)
Let’s create the new data frame with different quantiles of feature importance for each models.
And order by the 50% quantile from high to low, select top features based on that.
quantiles <- t(apply(combined_importance[,-1], 1, function(x) quantile(x, probs = c(0,0.25, 0.5, 0.75,1))))
combined_importance_quantiles <- cbind(Feature = combined_importance$Feature, quantiles)
combined_importance_quantiles <- as.data.frame(combined_importance_quantiles)
combined_importance_quantiles$`50%` <- as.numeric(combined_importance_quantiles$`50%`)
combined_importance_quantiles$`0%` <- as.numeric(combined_importance_quantiles$`0%`)
combined_importance_quantiles$`25%` <- as.numeric(combined_importance_quantiles$`25%`)
combined_importance_quantiles$`75%` <- as.numeric(combined_importance_quantiles$`75%`)
combined_importance_quantiles$`100%` <- as.numeric(combined_importance_quantiles$`100%`)
# Sort by median importance (50th percentile)
combined_importance_quantiles <- combined_importance_quantiles[order(-combined_importance_quantiles$`50%`), ]
head(combined_importance_quantiles)
top_Num_median_features_imp <- head(combined_importance_quantiles,n = NUM_COMMON_FEATURES)
print(top_Num_median_features_imp)
## Feature 0% 25% 50% 75% 100%
## 1 age.now 0.003919221 0.01043952 0.7142857 0.7697314 1.0000000
## 323 PC2 0.000000000 0.28571429 0.6034537 1.0000000 1.0000000
## 320 cg27272246 0.448668503 0.56592979 0.6027351 0.6205822 0.7142857
## 256 cg20685672 0.428571429 0.51275876 0.5795757 0.5895764 0.6550619
## 282 cg23432430 0.083051637 0.28571429 0.5562570 0.5900073 0.6397029
## 2 cg00004073 0.000000000 0.44432903 0.5510762 0.5714286 0.6879460
## 97 cg06833284 0.105359201 0.47253884 0.5489338 0.5714286 0.5911471
## 199 cg14710850 0.000000000 0.54114610 0.5421938 0.5688398 0.7142857
## 52 cg03924089 0.087464057 0.42857143 0.5077216 0.5680408 0.6756981
## 178 cg13405878 0.025010289 0.42857143 0.5065871 0.5334443 0.5462371
## 138 cg10240127 0.058891264 0.40748150 0.5008917 0.5335498 0.7142857
## 198 cg14687298 0.479042622 0.48481061 0.4975588 0.5180094 0.7142857
## 217 cg16652920 0.428571429 0.45851096 0.4971869 0.5951502 0.6099915
## 56 cg04248279 0.358205829 0.44127057 0.4917611 0.5714286 0.5912281
## 289 cg24433124 0.216172856 0.47741131 0.4890853 0.5417266 0.7142857
## 172 cg12543766 0.428571429 0.48554307 0.4888664 0.5302620 0.8351779
## 157 cg11331837 0.164575621 0.30888127 0.4831631 0.8571429 0.8615794
## 226 cg17129965 0.010719079 0.43190090 0.4824708 0.5265814 0.5714286
## 26 cg02225060 0.285714286 0.37869301 0.4699868 0.4931918 0.5782728
## 122 cg08857872 0.015229990 0.23415135 0.4670717 0.5096105 0.5714286
top_Num_median_features_Name<-top_Num_median_features_imp$Feature
print(top_Num_median_features_Name)
## [1] "age.now" "PC2" "cg27272246" "cg20685672" "cg23432430" "cg00004073" "cg06833284" "cg14710850" "cg03924089" "cg13405878" "cg10240127" "cg14687298" "cg16652920" "cg04248279" "cg24433124"
## [16] "cg12543766" "cg11331837" "cg17129965" "cg02225060" "cg08857872"
Visualization with the box plot.
library(tidyr)
long_df <- pivot_longer(combined_importance_quantiles,
cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
names_to = "Quantile",
values_to = "Importance")
ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
geom_boxplot() +
coord_flip() +
labs(title = "Distribution of Feature Importances",
x = "Feature",
y = "Importance") +
theme_minimal()
Visualization with top features with box plot.
library(tidyr)
long_df <- pivot_longer(top_Num_median_features_imp,
cols = c(`0%`, `25%`, `50%`, `75%`, `100%`),
names_to = "Quantile",
values_to = "Importance")
ggplot(long_df, aes(x = reorder(Feature, Importance), y = Importance)) +
geom_boxplot() +
coord_flip() +
labs(
title = paste("Distribution of Top",NUM_COMMON_FEATURES,"Feature Importance Sorted by Median Value"),
x = "Feature",
y = "Importance") +
theme_minimal()
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- NUM_COMMON_FEATURES_SET_Frequency
combined_importance_freq_ordered_df<-combined_importance_Avg_ordered
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature
# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature
# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature
# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature
# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))
models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models),
dimnames = list(all_features, models))
# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
feature_matrix[feature, "LRM"] <-
as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
feature_matrix[feature, "XGB"] <-
as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
feature_matrix[feature, "ENM"] <-
as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
feature_matrix[feature, "RF"] <-
as.integer(feature %in% top_impAvg_orderby_RF_NAME)
feature_matrix[feature, "SVM"] <-
as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}
feature_df <- as.data.frame(feature_matrix)
print(head(feature_df))
## LRM XGB ENM RF SVM
## PC2 1 0 1 0 0
## cg27272246 1 1 1 0 1
## cg14710850 1 0 1 0 1
## cg23432430 1 0 1 0 0
## cg00004073 1 0 0 1 0
## cg13405878 1 0 1 0 0
For quickly read, we calculate the time that the feature have been appeared, by calculated row sum and add the row sum column into our data frame.
feature_df$Total_Count <- rowSums(feature_df[,1:5])
feature_df <- feature_df[order(-feature_df$Total_Count), ]
frequency_feature_df_RAW_ordered<-feature_df
print(feature_df)
## LRM XGB ENM RF SVM Total_Count
## cg27272246 1 1 1 0 1 4
## cg14687298 1 1 1 0 1 4
## cg14710850 1 0 1 0 1 3
## cg20685672 1 1 1 0 0 3
## cg16652920 1 1 1 0 0 3
## cg24433124 1 0 1 0 1 3
## cg12543766 1 1 1 0 0 3
## age.now 0 1 0 1 1 3
## PC2 1 0 1 0 0 2
## cg23432430 1 0 1 0 0 2
## cg00004073 1 0 0 1 0 2
## cg13405878 1 0 1 0 0 2
## cg02981548 1 0 1 0 0 2
## cg03924089 1 0 1 0 0 2
## cg07480955 1 0 1 0 0 2
## cg17129965 1 0 1 0 0 2
## cg11331837 0 1 0 0 1 2
## cg04412904 0 1 0 1 0 2
## cg10240127 0 1 0 0 1 2
## cg08880261 0 1 0 1 0 2
## cg14168080 0 0 0 1 1 2
## cg14582632 1 0 0 0 0 1
## cg08788093 1 0 0 0 0 1
## cg21243064 1 0 0 0 0 1
## cg02225060 1 0 0 0 0 1
## cg19471911 1 0 0 0 0 1
## cg22535849 0 1 0 0 0 1
## cg19799454 0 1 0 0 0 1
## cg06961873 0 1 0 0 0 1
## cg23836570 0 1 0 0 0 1
## cg20078646 0 1 0 0 0 1
## cg04316537 0 1 0 0 0 1
## cg18285382 0 1 0 0 0 1
## cg04718469 0 1 0 0 0 1
## PC1 0 1 0 0 0 1
## cg02621446 0 1 0 0 0 1
## cg07028768 0 0 1 0 0 1
## cg00962106 0 0 1 0 0 1
## cg09015880 0 0 1 0 0 1
## cg00086247 0 0 1 0 0 1
## cg06833284 0 0 1 0 0 1
## cg06634367 0 0 1 0 0 1
## cg06286533 0 0 0 1 0 1
## cg03749159 0 0 0 1 0 1
## cg18526121 0 0 0 1 0 1
## cg24851651 0 0 0 1 0 1
## cg23658987 0 0 0 1 0 1
## cg26081710 0 0 0 1 0 1
## cg27086157 0 0 0 1 0 1
## cg15501526 0 0 0 1 0 1
## cg10864200 0 0 0 1 0 1
## cg02464073 0 0 0 1 0 1
## cg09289202 0 0 0 1 0 1
## cg18819889 0 0 0 1 0 1
## cg12228670 0 0 0 1 0 1
## cg27160885 0 0 0 1 0 1
## cg03979311 0 0 0 1 0 1
## cg05161773 0 0 0 0 1 1
## cg26705599 0 0 0 0 1 1
## cg12776173 0 0 0 0 1 1
## cg07640670 0 0 0 0 1 1
## cg00767423 0 0 0 0 1 1
## cg10978526 0 0 0 0 1 1
## cg06546677 0 0 0 0 1 1
## cg26901661 0 0 0 0 1 1
## cg02772171 0 0 0 0 1 1
## cg06115838 0 0 0 0 1 1
## cg07478795 0 0 0 0 1 1
## cg12784167 0 0 0 0 1 1
all_features <- union(combined_importance_freq_ordered_df$Feature, rownames(feature_df))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_df_full <- data.frame(Feature = all_features)
feature_df_full <- merge(feature_df_full, feature_df, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_df_full[is.na(feature_df_full)] <- 0
# For top_impAvg_ordered
all_impAvg_ordered_full <- data.frame(Feature = all_features)
all_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,all_impAvg_ordered_full, by.x = "Feature", by.y = "Feature", all.x = TRUE)
all_impAvg_ordered_full[is.na(all_impAvg_ordered_full)] <- 0
all_combined_df_impAvg <- merge(feature_df_full, all_impAvg_ordered_full, by = "Feature", all = TRUE)
print(head(feature_df_full))
## Feature LRM XGB ENM RF SVM Total_Count
## 1 age.now 0 1 0 1 1 3
## 2 cg00004073 1 0 0 1 0 2
## 3 cg00084271 0 0 0 0 0 0
## 4 cg00086247 0 0 1 0 0 1
## 5 cg00154902 0 0 0 0 0 0
## 6 cg00247094 0 0 0 0 0 0
print(head(all_impAvg_ordered_full))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0.01043952 0.7697314 0.003919221 1.0000000 0.7142857 0.4996752
## 2 cg00004073 0.55107624 0.0000000 0.444329028 0.6879460 0.5714286 0.4509560
## 3 cg00084271 0.28081441 0.3024610 0.225486534 0.2954690 0.4285714 0.3065605
## 4 cg00086247 0.41379132 0.1954000 0.558326182 0.4872834 0.0000000 0.3309602
## 5 cg00154902 0.23356839 0.0192178 0.334486044 0.2903668 0.4285714 0.2612421
## 6 cg00247094 0.00000000 0.0000000 0.192524686 0.3183556 0.4285714 0.1878903
print(head(all_combined_df_impAvg))
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0 1 0 1 1 3 0.01043952 0.7697314 0.003919221 1.0000000 0.7142857 0.4996752
## 2 cg00004073 1 0 0 1 0 2 0.55107624 0.0000000 0.444329028 0.6879460 0.5714286 0.4509560
## 3 cg00084271 0 0 0 0 0 0 0.28081441 0.3024610 0.225486534 0.2954690 0.4285714 0.3065605
## 4 cg00086247 0 0 1 0 0 1 0.41379132 0.1954000 0.558326182 0.4872834 0.0000000 0.3309602
## 5 cg00154902 0 0 0 0 0 0 0.23356839 0.0192178 0.334486044 0.2903668 0.4285714 0.2612421
## 6 cg00247094 0 0 0 0 0 0 0.00000000 0.0000000 0.192524686 0.3183556 0.4285714 0.1878903
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.
if(METHOD_FEATURE_FLAG == 3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG == 5 || METHOD_FEATURE_FLAG==6){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data[,c("DX",df_process_mutual_FeatureName)]
print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
## [1] "The number of final used features of common importance method: 8"
if(METHOD_FEATURE_FLAG == 1){
df_process_mutual_FeatureName <- rownames(feature_df[feature_df$Total_Count>=3,])
df_process_mutual<-processed_data_m1[,c("DX",df_process_mutual_FeatureName)]
print(paste("The number of final used features of common importance method:", length(df_process_mutual) - 1 ))
}
print(df_process_mutual_FeatureName)
## [1] "cg27272246" "cg14687298" "cg14710850" "cg20685672" "cg16652920" "cg24433124" "cg12543766" "age.now"
Importance of these features:
Top_Frequency_Feature_importance <- combined_importance_freq_ordered_df[
combined_importance_freq_ordered_df$Feature %in% df_process_mutual_FeatureName,
]
print(Top_Frequency_Feature_importance)
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 320 cg27272246 0.62058225 0.4486685 0.565929788 0.6027351 0.7142857 0.5904403
## 172 cg12543766 0.48554307 0.8351779 0.530261958 0.4888664 0.4285714 0.5536841
## 256 cg20685672 0.51275876 0.5795757 0.589576401 0.6550619 0.4285714 0.5531088
## 198 cg14687298 0.48481061 0.4790426 0.497558803 0.5180094 0.7142857 0.5387414
## 217 cg16652920 0.49718690 0.4585110 0.609991512 0.5951502 0.4285714 0.5178822
## 1 age.now 0.01043952 0.7697314 0.003919221 1.0000000 0.7142857 0.4996752
## 289 cg24433124 0.48908534 0.2161729 0.541726622 0.4774113 0.7142857 0.4877364
## 199 cg14710850 0.56883983 0.0000000 0.542193761 0.5411461 0.7142857 0.4732931
ggplot(Top_Frequency_Feature_importance, aes(x = reorder(Feature, Average_Importance), y = Average_Importance)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Feature Importance Selected Based on Frequncy Way and Sorted by Average Value",
x = "Feature",
y = "Average Importance") +
theme_minimal()
# This is to check if all elements inside Mutual method is in Mean method, and print out the features that not in Mean method
all(df_process_mutual_FeatureName %in% top_Num_combined_importance_Avg_ordered_Nam)
## [1] TRUE
Mutual_not_in_Mean <- setdiff(df_process_mutual_FeatureName, top_Num_combined_importance_Avg_ordered_Nam)
print(Mutual_not_in_Mean)
## character(0)
Phenotype Part Data frame : “phenoticPart_RAW”
RAW Merged Data frame : “merged_df_raw”
Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”
Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”
Ordered Feature Frequency / Common Data Frame:
“frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency.
“feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
head(phenoticPart_RAW)
#
# save(NUM_COMMON_FEATURES,
# combined_importance_quantiles,
# combined_importance_Avg_ordered,
# frequency_feature_df_RAW_ordered,
# top_Num_median_features_Name,
# top_Num_combined_importance_Avg_ordered_Nam,
# file = "Part2_V8_08_top_features_5KCpGs.RData")
#
# save(processed_data_m3,processed_data_m3_df,AfterProcess_FeatureName_m3,file = "Part2_V8_08_BinaryMerged_5KCpGs.RData")
#
# save(phenoticPart_RAW, merged_df_raw, file = "PhenotypeAndMerged.RData")
The feature selection method :
Number_fea_input <- INPUT_NUMBER_FEATURES
Flag_8mean <- INPUT_Method_Mean_Choose
Flag_8median <- INPUT_Method_Median_Choose
Flag_8Fequency <- INPUT_Method_Frequency_Choose
print(paste("the Top number of features here is set to:", Number_fea_input))
## [1] "the Top number of features here is set to: 250"
Flag_8mean
## [1] TRUE
Flag_8median
## [1] TRUE
Flag_8Fequency
## [1] TRUE
selected_impAvg_ordered <- head(combined_importance_Avg_ordered,n = Number_fea_input)
print(head(selected_impAvg_ordered))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 320 cg27272246 0.6205822 0.4486685 0.5659298 0.6027351 0.7142857 0.5904403
## 323 PC2 1.0000000 0.0000000 1.0000000 0.6034537 0.2857143 0.5778336
## 172 cg12543766 0.4855431 0.8351779 0.5302620 0.4888664 0.4285714 0.5536841
## 256 cg20685672 0.5127588 0.5795757 0.5895764 0.6550619 0.4285714 0.5531088
## 198 cg14687298 0.4848106 0.4790426 0.4975588 0.5180094 0.7142857 0.5387414
## 157 cg11331837 0.1645756 0.8615794 0.3088813 0.4831631 0.8571429 0.5350685
print(dim(selected_impAvg_ordered))
## [1] 250 7
selected_impAvg_ordered_NAME <- selected_impAvg_ordered$Feature
print(head(selected_impAvg_ordered_NAME))
## [1] "cg27272246" "PC2" "cg12543766" "cg20685672" "cg14687298" "cg11331837"
df_selected_Mean <- processed_dataFrame[,c("DX",selected_impAvg_ordered_NAME)]
print(head(df_selected_Mean))
## DX cg27272246 PC2 cg12543766 cg20685672 cg14687298 cg11331837 cg16652920 cg14168080 age.now cg04248279 cg24433124 cg04412904 cg07028768 cg14710850 cg08861434 cg06833284
## 200223270003_R02C01 MCI 0.8615873 0.01470293 0.51028134 0.6712101 0.04206702 0.03692842 0.9436000 0.4190123 82.4 0.8534976 0.1316610 0.05088595 0.4496851 0.8048592 0.8768306 0.9125144
## 200223270003_R03C01 CN 0.8705287 0.05745834 0.88741539 0.7932091 0.14813581 0.57150125 0.9431222 0.4420256 78.6 0.8458854 0.5987648 0.07717659 0.8536078 0.8090950 0.4352647 0.9003482
## 200223270003_R06C01 CN 0.8103777 0.08372861 0.02818501 0.6613646 0.24260002 0.03182862 0.9457161 0.4355521 80.4 0.8332786 0.8188082 0.08253743 0.8356936 0.8285902 0.8698813 0.6097933
## cg03924089 cg20398163 cg00004073 cg00962106 cg10240127 cg06634367 cg02225060 cg04971651 cg09015880 cg19799454 cg03979311 cg07640670 cg08198851 cg02981548 cg11169344 cg06961873
## 200223270003_R02C01 0.7920449 0.1728144 0.02928535 0.9124898 0.9250553 0.8695793 0.6828159 0.8902474 0.5101716 0.9178930 0.86644909 0.58296513 0.6578905 0.1342571 0.6720163 0.5335591
## 200223270003_R03C01 0.7370283 0.8728944 0.02787198 0.5375751 0.9403255 0.9512930 0.8265195 0.9219452 0.8402106 0.9106247 0.06199853 0.55225610 0.6578186 0.5220037 0.8215477 0.5472606
## 200223270003_R06C01 0.8506756 0.2623391 0.64576463 0.5040948 0.9056974 0.9544163 0.5209552 0.9035233 0.8472063 0.9066551 0.72615553 0.04058533 0.1272153 0.5098965 0.5941114 0.9415177
## cg23432430 cg06483046 cg07480955 cg02621446 cg26081710 cg00767423 cg22741595 cg13405878 cg10978526 cg08880261 cg22535849 cg06546677 cg20078646 cg17129965 cg08779649 cg23836570
## 200223270003_R02C01 0.9482702 0.04383925 0.3874638 0.8731313 0.8751040 0.9298253 0.6525533 0.4549662 0.5671930 0.40655904 0.8847704 0.4472216 0.06198170 0.8972140 0.44449401 0.58688450
## 200223270003_R03C01 0.9455418 0.50720277 0.3916889 0.8095534 0.9198212 0.2651854 0.1730013 0.7858042 0.9095713 0.85616966 0.8609966 0.8484609 0.89537412 0.8806673 0.45076825 0.54259383
## 200223270003_R06C01 0.9418716 0.89604910 0.4043390 0.7511582 0.8801892 0.8667808 0.1550739 0.7583938 0.8945157 0.03280808 0.8808022 0.5636023 0.08725521 0.8857237 0.04810217 0.03267304
## cg15633912 cg23517115 cg26705599 cg18285382 cg18819889 cg23352245 cg12228670 cg26901661 cg02772171 cg06286533 cg07104639 cg17042243 cg06115838 cg15098922 cg07478795 cg08788093
## 200223270003_R02C01 0.1605530 0.2151144 0.8585917 0.3202927 0.9156157 0.9377232 0.8632174 0.8951971 0.9182018 0.2734841 0.6772717 0.2502905 0.8847724 0.9286092 0.8911007 0.03911678
## 200223270003_R03C01 0.9333421 0.9131440 0.8613854 0.2930577 0.9004455 0.9375774 0.8496212 0.8754981 0.5660559 0.9354924 0.7123879 0.2933475 0.8447916 0.9027517 0.9095543 0.60934160
## 200223270003_R06C01 0.8737362 0.8328364 0.4332832 0.8923595 0.9054439 0.5932742 0.8738949 0.9021064 0.8995479 0.8696546 0.8099688 0.2725457 0.8805585 0.8525611 0.8905903 0.88380243
## cg12784167 cg26219488 cg22071943 cg21415084 cg01921484 cg02887598 cg18526121 cg02631626 cg09289202 cg23066280 cg08857872 cg00819121 cg07504457 cg11438323 cg07158503 cg19471911
## 200223270003_R02C01 0.81503498 0.9336638 0.8705217 0.8374415 0.9098550 0.04020908 0.4519781 0.6280766 0.4361103 0.07247841 0.3395280 0.9207001 0.7116230 0.4863471 0.5777146 0.6334393
## 200223270003_R03C01 0.02811410 0.9134707 0.2442648 0.8509420 0.9093137 0.67073881 0.4762313 0.1951736 0.4397504 0.57174588 0.8181845 0.9281472 0.6854539 0.8984559 0.6203543 0.8437175
## 200223270003_R06C01 0.03073269 0.9261878 0.2644581 0.8378237 0.9204487 0.73408417 0.4833367 0.2699849 0.4193555 0.80814756 0.2970779 0.9327211 0.7205633 0.8722772 0.6236025 0.6127952
## cg14564293 cg18816397 cg27086157 PC1 cg03749159 cg21783012 cg09584650 cg21243064 cg06231502 cg00696044 cg14175932 cg04242342 cg10738049 cg15501526 cg21392220 cg00322003
## 200223270003_R02C01 0.52089591 0.5472925 0.9224112 -0.214185447 0.9355921 0.9142369 0.08230254 0.5191606 0.7784451 0.55608424 0.5746953 0.8206769 0.5441211 0.6362531 0.8726204 0.1759911
## 200223270003_R03C01 0.04000662 0.4940355 0.9219304 -0.172761185 0.9153921 0.6694884 0.09661586 0.9167649 0.7964278 0.07552381 0.8779027 0.8167892 0.5232715 0.6319253 0.8563905 0.5702070
## 200223270003_R06C01 0.04959460 0.5337018 0.3224986 -0.003667305 0.9255807 0.9070112 0.52399749 0.4862205 0.7706160 0.79270858 0.7288239 0.8040357 0.4875473 0.7435100 0.8466199 0.3077122
## cg05234269 cg16779438 cg14293999 cg03723481 cg06118351 cg00086247 cg15138543 cg18918831 cg12702014 cg25598710 cg10681981 cg01128042 cg03395511 cg22933800 cg16655091 cg17018422
## 200223270003_R02C01 0.93848584 0.8826150 0.2836710 0.4347333 0.3633940 0.1761275 0.7734778 0.4891660 0.7704049 0.3105752 0.7035090 0.9113420 0.4491605 0.4830774 0.6055295 0.5262747
## 200223270003_R03C01 0.57461229 0.5466924 0.9172023 0.9007774 0.4714860 0.2045043 0.2949313 0.5333801 0.7848681 0.3088142 0.7382662 0.5328806 0.4835967 0.4142525 0.7053336 0.9029604
## 200223270003_R06C01 0.02467208 0.8629492 0.9168166 0.8947417 0.8655962 0.6901217 0.2496147 0.6406575 0.8065993 0.8538820 0.6971989 0.5222757 0.5523959 0.3956683 0.8724479 0.5100750
## cg14228103 cg11019791 cg19097407 cg23658987 cg08138245 cg24139837 cg14507637 cg04316537 cg12776173 cg20300784 cg17429539 cg06394820 cg21388339 cg05130642 cg12953206 cg15600437
## 200223270003_R02C01 0.9141064 0.8112324 0.1417931 0.79757644 0.8115760 0.07404605 0.9051258 0.8074830 0.1038804 0.86585964 0.7860900 0.8513195 0.2756268 0.8575504 0.2364836 0.4885353
## 200223270003_R03C01 0.8591302 0.7831231 0.8367297 0.07511718 0.1109940 0.04183445 0.9009460 0.8453340 0.8730635 0.86609999 0.7100923 0.8695521 0.2102269 0.8644077 0.2338141 0.4894487
## 200223270003_R06C01 0.1834348 0.4353250 0.2276425 0.10177571 0.7444698 0.05657120 0.9013686 0.4351695 0.7009491 0.03091187 0.7660838 0.4415020 0.7649181 0.3661324 0.6638030 0.8551374
## cg25208881 cg17738613 cg03660162 cg00553601 cg11268585 cg25366315 cg00084271 cg16715186 cg02356645 cg26069044 cg05161773 cg11286989 cg26679884 cg21507367 cg27160885 cg04664583
## 200223270003_R02C01 0.1851956 0.6879612 0.8691767 0.05601299 0.2521544 0.9182318 0.8103611 0.2742789 0.5105903 0.9240187 0.4120912 0.7590008 0.6793815 0.9268560 0.2231606 0.5572814
## 200223270003_R03C01 0.9092286 0.6582258 0.5160770 0.58957701 0.8535791 0.9209800 0.7877006 0.7946153 0.5833923 0.9407223 0.4154907 0.8533989 0.1848705 0.9290102 0.8263885 0.5881190
## 200223270003_R06C01 0.9265502 0.1022257 0.9026304 0.62426500 0.9121931 0.8972984 0.7706165 0.8124316 0.5701428 0.9332131 0.8526849 0.7313884 0.1701734 0.9039559 0.2121179 0.9352717
## cg21812850 cg03737947 cg16771215 cg05799088 cg22112152 cg05392160 cg17653352 cg02372404 cg08745107 cg26983017 cg25436480 cg21209485 cg21139150 cg03327352 cg23923019 cg18150287
## 200223270003_R02C01 0.7920645 0.91824910 0.88389723 0.9023317 0.8476101 0.9328933 0.9269778 0.03598249 0.02921338 0.89868232 0.8425160 0.8865053 0.01853264 0.8851712 0.8555018 0.7685695
## 200223270003_R03C01 0.7688711 0.92067153 0.07196933 0.8779381 0.8014136 0.2576881 0.9086951 0.02767285 0.78542320 0.03145466 0.4994032 0.8714878 0.43223243 0.8786878 0.3058914 0.7519166
## 200223270003_R06C01 0.7702792 0.03638091 0.09949974 0.6887230 0.7897897 0.8920726 0.9341775 0.03127855 0.02709928 0.84677625 0.3494312 0.2292550 0.43772680 0.3042310 0.8108207 0.2501173
## cg15535896 cg05876883 cg23159970 cg06880438 cg02246922 cg25649515 cg05155812 cg17186592 cg24851651 cg15985500 cg02464073 cg08514194 cg10738648 cg11187460 cg27577781 cg10091792
## 200223270003_R02C01 0.3382952 0.9039064 0.61817246 0.8285145 0.7301201 0.9279829 0.4514427 0.9230463 0.03674702 0.8555262 0.4842537 0.9128478 0.44931577 0.03672179 0.8143535 0.8670733
## 200223270003_R03C01 0.9253926 0.9223308 0.57492600 0.7988881 0.9447019 0.9235753 0.9070932 0.8593448 0.05358297 0.8312198 0.4998933 0.2613138 0.49894016 0.92516409 0.8113185 0.5864221
## 200223270003_R06C01 0.3320191 0.4697980 0.03288909 0.7839538 0.7202230 0.5895839 0.4107396 0.8467599 0.05968923 0.8492103 0.9077933 0.9202187 0.05552024 0.03109553 0.8144274 0.6087997
## cg13815695 cg26948066 cg25306893 cg03129555 cg04462915 cg06697310 cg14582632 cg19301366 cg10666341 cg03221390 cg22169467 cg04831745 cg06864789 cg01933473 cg05891136 cg15586958
## 200223270003_R02C01 0.9267057 0.4685225 0.6265392 0.6079616 0.03224861 0.8454609 0.8475098 0.8831393 0.9046648 0.5859063 0.3095010 0.61984995 0.05369415 0.2589014 0.7797403 0.9058263
## 200223270003_R03C01 0.6859729 0.5026045 0.8330282 0.5785498 0.50740695 0.8653044 0.5526692 0.8072679 0.6731062 0.9180706 0.2978585 0.71214149 0.46053125 0.6726133 0.3310206 0.8957526
## 200223270003_R06C01 0.6509046 0.9101976 0.6175380 0.9137818 0.02700644 0.2405168 0.5288675 0.8796022 0.6443180 0.6399867 0.8955853 0.06871768 0.87513655 0.2642560 0.7965298 0.9121763
## cg26853071 cg11227702 cg15491125 cg16571124 cg10039445 cg09247979 cg04728936 cg13573375 cg05570109 cg12421087 cg00154902 cg04645024 cg13739190 cg20208879 cg04718469 cg08669168
## 200223270003_R02C01 0.4233820 0.86486075 0.9066635 0.9282854 0.8833873 0.5070956 0.2172057 0.8670419 0.3466611 0.5647607 0.5137741 0.7366541 0.8510103 0.66986658 0.8687522 0.9226769
## 200223270003_R03C01 0.7451354 0.49184121 0.3850991 0.9206431 0.8954055 0.5706177 0.1925451 0.1733934 0.5866750 0.5399655 0.8540746 0.8454827 0.8358482 0.02423079 0.7256813 0.9164547
## 200223270003_R06C01 0.4228079 0.02543724 0.9091504 0.9276842 0.8832807 0.5090215 0.2379376 0.8888246 0.4046471 0.5400348 0.8188126 0.0871902 0.8419471 0.61769424 0.8521881 0.6362087
## cg11314779 cg25879395 cg06403901 cg09727210 cg19377607 cg01549082 cg06371647 cg12012426 cg03549208 cg18993517 cg22666875 cg01008088 cg12333628 cg09216282 cg12146221 cg14192979
## 200223270003_R02C01 0.0242134 0.88130864 0.92790690 0.4240111 0.05377464 0.2924138 0.8336894 0.9165048 0.9014487 0.2091538 0.8177182 0.8424817 0.9227884 0.9349248 0.2049284 0.06336040
## 200223270003_R03C01 0.8966100 0.02603438 0.04783341 0.8812928 0.90570746 0.7065693 0.8198684 0.9434768 0.8381784 0.2665896 0.8291957 0.2417656 0.9092861 0.9244259 0.1814927 0.06019651
## 200223270003_R06C01 0.8908661 0.91060615 0.05253626 0.8493743 0.06636174 0.2895440 0.8069537 0.9220044 0.9097817 0.2574003 0.3694180 0.2618620 0.5084647 0.9263996 0.8619250 0.52114282
## cg22542451 cg03635532 cg07634717 cg10993865 cg14307563 cg14623940 cg16089727 cg26846609 cg04888234 cg17268094 cg06960717 cg26642936 cg14649234 cg06715136 cg07227024 cg15775217
## 200223270003_R02C01 0.5884356 0.8416733 0.7483382 0.9173768 0.1855966 0.7623774 0.86748697 0.48860949 0.8379655 0.5774753 0.7030978 0.7619266 0.05165754 0.3400192 0.04553128 0.5707441
## 200223270003_R03C01 0.8337068 0.8262538 0.8254434 0.9096170 0.8916957 0.8732905 0.54996692 0.04878986 0.4376314 0.9003262 0.7653402 0.7023413 0.79015014 0.9259109 0.05004286 0.9168327
## 200223270003_R06C01 0.8125084 0.8450480 0.8181246 0.4904519 0.8750052 0.8661720 0.05876736 0.48026945 0.8039047 0.8789368 0.7206218 0.7099380 0.65413166 0.9079807 0.06152206 0.6042521
## cg11540596 cg16536985 cg03088219 cg00689685 cg01153376 cg16180556 cg25169289 cg03982462 cg24883219 cg14240646
## 200223270003_R02C01 0.9238951 0.5789643 0.844002862 0.7019389 0.4872148 0.39300141 0.1100884 0.8562777 0.6430473 0.5391334
## 200223270003_R03C01 0.8926595 0.5418687 0.007435243 0.8634268 0.9639670 0.07312155 0.7667174 0.6023731 0.6822115 0.2538363
## 200223270003_R06C01 0.8820252 0.8392044 0.120155222 0.6378795 0.2242410 0.20051805 0.2264993 0.8778458 0.5296903 0.1864902
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Mean)
## [1] 554 251
print(selected_impAvg_ordered_NAME)
## [1] "cg27272246" "PC2" "cg12543766" "cg20685672" "cg14687298" "cg11331837" "cg16652920" "cg14168080" "age.now" "cg04248279" "cg24433124" "cg04412904" "cg07028768" "cg14710850" "cg08861434"
## [16] "cg06833284" "cg03924089" "cg20398163" "cg00004073" "cg00962106" "cg10240127" "cg06634367" "cg02225060" "cg04971651" "cg09015880" "cg19799454" "cg03979311" "cg07640670" "cg08198851" "cg02981548"
## [31] "cg11169344" "cg06961873" "cg23432430" "cg06483046" "cg07480955" "cg02621446" "cg26081710" "cg00767423" "cg22741595" "cg13405878" "cg10978526" "cg08880261" "cg22535849" "cg06546677" "cg20078646"
## [46] "cg17129965" "cg08779649" "cg23836570" "cg15633912" "cg23517115" "cg26705599" "cg18285382" "cg18819889" "cg23352245" "cg12228670" "cg26901661" "cg02772171" "cg06286533" "cg07104639" "cg17042243"
## [61] "cg06115838" "cg15098922" "cg07478795" "cg08788093" "cg12784167" "cg26219488" "cg22071943" "cg21415084" "cg01921484" "cg02887598" "cg18526121" "cg02631626" "cg09289202" "cg23066280" "cg08857872"
## [76] "cg00819121" "cg07504457" "cg11438323" "cg07158503" "cg19471911" "cg14564293" "cg18816397" "cg27086157" "PC1" "cg03749159" "cg21783012" "cg09584650" "cg21243064" "cg06231502" "cg00696044"
## [91] "cg14175932" "cg04242342" "cg10738049" "cg15501526" "cg21392220" "cg00322003" "cg05234269" "cg16779438" "cg14293999" "cg03723481" "cg06118351" "cg00086247" "cg15138543" "cg18918831" "cg12702014"
## [106] "cg25598710" "cg10681981" "cg01128042" "cg03395511" "cg22933800" "cg16655091" "cg17018422" "cg14228103" "cg11019791" "cg19097407" "cg23658987" "cg08138245" "cg24139837" "cg14507637" "cg04316537"
## [121] "cg12776173" "cg20300784" "cg17429539" "cg06394820" "cg21388339" "cg05130642" "cg12953206" "cg15600437" "cg25208881" "cg17738613" "cg03660162" "cg00553601" "cg11268585" "cg25366315" "cg00084271"
## [136] "cg16715186" "cg02356645" "cg26069044" "cg05161773" "cg11286989" "cg26679884" "cg21507367" "cg27160885" "cg04664583" "cg21812850" "cg03737947" "cg16771215" "cg05799088" "cg22112152" "cg05392160"
## [151] "cg17653352" "cg02372404" "cg08745107" "cg26983017" "cg25436480" "cg21209485" "cg21139150" "cg03327352" "cg23923019" "cg18150287" "cg15535896" "cg05876883" "cg23159970" "cg06880438" "cg02246922"
## [166] "cg25649515" "cg05155812" "cg17186592" "cg24851651" "cg15985500" "cg02464073" "cg08514194" "cg10738648" "cg11187460" "cg27577781" "cg10091792" "cg13815695" "cg26948066" "cg25306893" "cg03129555"
## [181] "cg04462915" "cg06697310" "cg14582632" "cg19301366" "cg10666341" "cg03221390" "cg22169467" "cg04831745" "cg06864789" "cg01933473" "cg05891136" "cg15586958" "cg26853071" "cg11227702" "cg15491125"
## [196] "cg16571124" "cg10039445" "cg09247979" "cg04728936" "cg13573375" "cg05570109" "cg12421087" "cg00154902" "cg04645024" "cg13739190" "cg20208879" "cg04718469" "cg08669168" "cg11314779" "cg25879395"
## [211] "cg06403901" "cg09727210" "cg19377607" "cg01549082" "cg06371647" "cg12012426" "cg03549208" "cg18993517" "cg22666875" "cg01008088" "cg12333628" "cg09216282" "cg12146221" "cg14192979" "cg22542451"
## [226] "cg03635532" "cg07634717" "cg10993865" "cg14307563" "cg14623940" "cg16089727" "cg26846609" "cg04888234" "cg17268094" "cg06960717" "cg26642936" "cg14649234" "cg06715136" "cg07227024" "cg15775217"
## [241] "cg11540596" "cg16536985" "cg03088219" "cg00689685" "cg01153376" "cg16180556" "cg25169289" "cg03982462" "cg24883219" "cg14240646"
output_mean_process<-processed_data[,c("DX",selected_impAvg_ordered_NAME)]
print(head(output_mean_process))
## # A tibble: 6 × 251
## DX cg27272246 PC2 cg12543766 cg20685672 cg14687298 cg11331837 cg16652920 cg14168080 age.now cg04248279 cg24433124 cg04412904 cg07028768 cg14710850 cg08861434 cg06833284 cg03924089 cg20398163
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI 0.862 1.47e-2 0.510 0.671 0.0421 0.0369 0.944 0.419 82.4 0.853 0.132 0.0509 0.450 0.805 0.877 0.913 0.792 0.173
## 2 CN 0.871 5.75e-2 0.887 0.793 0.148 0.572 0.943 0.442 78.6 0.846 0.599 0.0772 0.854 0.809 0.435 0.900 0.737 0.873
## 3 CN 0.810 8.37e-2 0.0282 0.661 0.243 0.0318 0.946 0.436 80.4 0.833 0.819 0.0825 0.836 0.829 0.870 0.610 0.851 0.262
## 4 MCI 0.769 1.65e-5 0.818 0.0829 0.513 0.930 0.953 0.946 62.9 0.597 0.592 0.119 0.884 0.850 0.862 0.0381 0.869 0.267
## 5 CN 0.440 1.57e-2 0.457 0.845 0.0362 0.540 0.949 0.399 80.7 0.894 0.574 0.0889 0.451 0.821 0.906 0.915 0.748 0.682
## 6 MCI 0.750 3.46e-2 0.804 0.657 0.241 0.924 0.949 0.950 80.6 0.273 0.606 0.117 0.849 0.845 0.468 0.901 0.753 0.829
## # ℹ 232 more variables: cg00004073 <dbl>, cg00962106 <dbl>, cg10240127 <dbl>, cg06634367 <dbl>, cg02225060 <dbl>, cg04971651 <dbl>, cg09015880 <dbl>, cg19799454 <dbl>, cg03979311 <dbl>,
## # cg07640670 <dbl>, cg08198851 <dbl>, cg02981548 <dbl>, cg11169344 <dbl>, cg06961873 <dbl>, cg23432430 <dbl>, cg06483046 <dbl>, cg07480955 <dbl>, cg02621446 <dbl>, cg26081710 <dbl>,
## # cg00767423 <dbl>, cg22741595 <dbl>, cg13405878 <dbl>, cg10978526 <dbl>, cg08880261 <dbl>, cg22535849 <dbl>, cg06546677 <dbl>, cg20078646 <dbl>, cg17129965 <dbl>, cg08779649 <dbl>,
## # cg23836570 <dbl>, cg15633912 <dbl>, cg23517115 <dbl>, cg26705599 <dbl>, cg18285382 <dbl>, cg18819889 <dbl>, cg23352245 <dbl>, cg12228670 <dbl>, cg26901661 <dbl>, cg02772171 <dbl>,
## # cg06286533 <dbl>, cg07104639 <dbl>, cg17042243 <dbl>, cg06115838 <dbl>, cg15098922 <dbl>, cg07478795 <dbl>, cg08788093 <dbl>, cg12784167 <dbl>, cg26219488 <dbl>, cg22071943 <dbl>,
## # cg21415084 <dbl>, cg01921484 <dbl>, cg02887598 <dbl>, cg18526121 <dbl>, cg02631626 <dbl>, cg09289202 <dbl>, cg23066280 <dbl>, cg08857872 <dbl>, cg00819121 <dbl>, cg07504457 <dbl>,
## # cg11438323 <dbl>, cg07158503 <dbl>, cg19471911 <dbl>, cg14564293 <dbl>, cg18816397 <dbl>, cg27086157 <dbl>, PC1 <dbl>, cg03749159 <dbl>, cg21783012 <dbl>, cg09584650 <dbl>, cg21243064 <dbl>, …
dim(output_mean_process)
## [1] 554 251
Selected_median_imp <- head(combined_importance_quantiles,n = Number_fea_input)
print(head(Selected_median_imp))
## Feature 0% 25% 50% 75% 100%
## 1 age.now 0.003919221 0.01043952 0.7142857 0.7697314 1.0000000
## 323 PC2 0.000000000 0.28571429 0.6034537 1.0000000 1.0000000
## 320 cg27272246 0.448668503 0.56592979 0.6027351 0.6205822 0.7142857
## 256 cg20685672 0.428571429 0.51275876 0.5795757 0.5895764 0.6550619
## 282 cg23432430 0.083051637 0.28571429 0.5562570 0.5900073 0.6397029
## 2 cg00004073 0.000000000 0.44432903 0.5510762 0.5714286 0.6879460
Selected_median_imp_Name<-Selected_median_imp$Feature
print(head(Selected_median_imp_Name))
## [1] "age.now" "PC2" "cg27272246" "cg20685672" "cg23432430" "cg00004073"
df_selected_Median <- processed_dataFrame[,c("DX",Selected_median_imp_Name)]
output_median_feature<-processed_data[,c("DX",Selected_median_imp_Name)]
print(head(df_selected_Median))
## DX age.now PC2 cg27272246 cg20685672 cg23432430 cg00004073 cg06833284 cg14710850 cg03924089 cg13405878 cg10240127 cg14687298 cg16652920 cg04248279 cg24433124 cg12543766
## 200223270003_R02C01 MCI 82.4 0.01470293 0.8615873 0.6712101 0.9482702 0.02928535 0.9125144 0.8048592 0.7920449 0.4549662 0.9250553 0.04206702 0.9436000 0.8534976 0.1316610 0.51028134
## 200223270003_R03C01 CN 78.6 0.05745834 0.8705287 0.7932091 0.9455418 0.02787198 0.9003482 0.8090950 0.7370283 0.7858042 0.9403255 0.14813581 0.9431222 0.8458854 0.5987648 0.88741539
## 200223270003_R06C01 CN 80.4 0.08372861 0.8103777 0.6613646 0.9418716 0.64576463 0.6097933 0.8285902 0.8506756 0.7583938 0.9056974 0.24260002 0.9457161 0.8332786 0.8188082 0.02818501
## cg11331837 cg17129965 cg02225060 cg08857872 cg22741595 PC1 cg06961873 cg09015880 cg00962106 cg08198851 cg26901661 cg14168080 cg02981548 cg11169344 cg26081710 cg02621446
## 200223270003_R02C01 0.03692842 0.8972140 0.6828159 0.3395280 0.6525533 -0.214185447 0.5335591 0.5101716 0.9124898 0.6578905 0.8951971 0.4190123 0.1342571 0.6720163 0.8751040 0.8731313
## 200223270003_R03C01 0.57150125 0.8806673 0.8265195 0.8181845 0.1730013 -0.172761185 0.5472606 0.8402106 0.5375751 0.6578186 0.8754981 0.4420256 0.5220037 0.8215477 0.9198212 0.8095534
## 200223270003_R06C01 0.03182862 0.8857237 0.5209552 0.2970779 0.1550739 -0.003667305 0.9415177 0.8472063 0.5040948 0.1272153 0.9021064 0.4355521 0.5098965 0.5941114 0.8801892 0.7511582
## cg04412904 cg06231502 cg07028768 cg07104639 cg08880261 cg18285382 cg18819889 cg26219488 cg04971651 cg02631626 cg20078646 cg07480955 cg17042243 cg08861434 cg00086247 cg06634367
## 200223270003_R02C01 0.05088595 0.7784451 0.4496851 0.6772717 0.40655904 0.3202927 0.9156157 0.9336638 0.8902474 0.6280766 0.06198170 0.3874638 0.2502905 0.8768306 0.1761275 0.8695793
## 200223270003_R03C01 0.07717659 0.7964278 0.8536078 0.7123879 0.85616966 0.2930577 0.9004455 0.9134707 0.9219452 0.1951736 0.89537412 0.3916889 0.2933475 0.4352647 0.2045043 0.9512930
## 200223270003_R06C01 0.08253743 0.7706160 0.8356936 0.8099688 0.03280808 0.8923595 0.9054439 0.9261878 0.9035233 0.2699849 0.08725521 0.4043390 0.2725457 0.8698813 0.6901217 0.9544163
## cg06483046 cg10978526 cg07640670 cg23517115 cg07504457 cg00696044 cg21812850 cg17429539 cg20398163 cg12228670 cg14564293 cg03979311 cg12784167 cg06115838 cg07158503 cg02772171
## 200223270003_R02C01 0.04383925 0.5671930 0.58296513 0.2151144 0.7116230 0.55608424 0.7920645 0.7860900 0.1728144 0.8632174 0.52089591 0.86644909 0.81503498 0.8847724 0.5777146 0.9182018
## 200223270003_R03C01 0.50720277 0.9095713 0.55225610 0.9131440 0.6854539 0.07552381 0.7688711 0.7100923 0.8728944 0.8496212 0.04000662 0.06199853 0.02811410 0.8447916 0.6203543 0.5660559
## 200223270003_R06C01 0.89604910 0.8945157 0.04058533 0.8328364 0.7205633 0.79270858 0.7702792 0.7660838 0.2623391 0.8738949 0.04959460 0.72615553 0.03073269 0.8805585 0.6236025 0.8995479
## cg01921484 cg22933800 cg11438323 cg10039445 cg18816397 cg03660162 cg21243064 cg23352245 cg06118351 cg25208881 cg14175932 cg16715186 cg10681981 cg08788093 cg26679884 cg14293999
## 200223270003_R02C01 0.9098550 0.4830774 0.4863471 0.8833873 0.5472925 0.8691767 0.5191606 0.9377232 0.3633940 0.1851956 0.5746953 0.2742789 0.7035090 0.03911678 0.6793815 0.2836710
## 200223270003_R03C01 0.9093137 0.4142525 0.8984559 0.8954055 0.4940355 0.5160770 0.9167649 0.9375774 0.4714860 0.9092286 0.8779027 0.7946153 0.7382662 0.60934160 0.1848705 0.9172023
## 200223270003_R06C01 0.9204487 0.3956683 0.8722772 0.8832807 0.5337018 0.9026304 0.4862205 0.5932742 0.8655962 0.9265502 0.7288239 0.8124316 0.6971989 0.88380243 0.1701734 0.9168166
## cg06546677 cg00819121 cg18526121 cg23066280 cg23923019 cg07478795 cg21139150 cg15633912 cg08138245 cg15098922 cg21392220 cg06880438 cg04664583 cg14507637 cg21388339 cg15501526
## 200223270003_R02C01 0.4472216 0.9207001 0.4519781 0.07247841 0.8555018 0.8911007 0.01853264 0.1605530 0.8115760 0.9286092 0.8726204 0.8285145 0.5572814 0.9051258 0.2756268 0.6362531
## 200223270003_R03C01 0.8484609 0.9281472 0.4762313 0.57174588 0.3058914 0.9095543 0.43223243 0.9333421 0.1109940 0.9027517 0.8563905 0.7988881 0.5881190 0.9009460 0.2102269 0.6319253
## 200223270003_R06C01 0.5636023 0.9327211 0.4833367 0.80814756 0.8108207 0.8905903 0.43772680 0.8737362 0.7444698 0.8525611 0.8466199 0.7839538 0.9352717 0.9013686 0.7649181 0.7435100
## cg25366315 cg10738648 cg17738613 cg08779649 cg16779438 cg10738049 cg15600437 cg10091792 cg19471911 cg11286989 cg02887598 cg12146221 cg26948066 cg27086157 cg26853071 cg04316537
## 200223270003_R02C01 0.9182318 0.44931577 0.6879612 0.44449401 0.8826150 0.5441211 0.4885353 0.8670733 0.6334393 0.7590008 0.04020908 0.2049284 0.4685225 0.9224112 0.4233820 0.8074830
## 200223270003_R03C01 0.9209800 0.49894016 0.6582258 0.45076825 0.5466924 0.5232715 0.4894487 0.5864221 0.8437175 0.8533989 0.67073881 0.1814927 0.5026045 0.9219304 0.7451354 0.8453340
## 200223270003_R06C01 0.8972984 0.05552024 0.1022257 0.04810217 0.8629492 0.4875473 0.8551374 0.6087997 0.6127952 0.7313884 0.73408417 0.8619250 0.9101976 0.3224986 0.4228079 0.4351695
## cg06960717 cg13739190 cg06403901 cg14582632 cg21507367 cg17186592 cg07634717 cg09216282 cg22112152 cg00084271 cg19301366 cg00154902 cg23836570 cg05234269 cg19799454 cg26705599
## 200223270003_R02C01 0.7030978 0.8510103 0.92790690 0.8475098 0.9268560 0.9230463 0.7483382 0.9349248 0.8476101 0.8103611 0.8831393 0.5137741 0.58688450 0.93848584 0.9178930 0.8585917
## 200223270003_R03C01 0.7653402 0.8358482 0.04783341 0.5526692 0.9290102 0.8593448 0.8254434 0.9244259 0.8014136 0.7877006 0.8072679 0.8540746 0.54259383 0.57461229 0.9106247 0.8613854
## 200223270003_R06C01 0.7206218 0.8419471 0.05253626 0.5288675 0.9039559 0.8467599 0.8181246 0.9263996 0.7897897 0.7706165 0.8796022 0.8188126 0.03267304 0.02467208 0.9066551 0.4332832
## cg04718469 cg05799088 cg10666341 cg12333628 cg15985500 cg16202259 cg16771215 cg27160885 cg12689021 cg13815695 cg14307563 cg25436480 cg03982462 cg00767423 cg12421087 cg22535849
## 200223270003_R02C01 0.8687522 0.9023317 0.9046648 0.9227884 0.8555262 0.9548726 0.88389723 0.2231606 0.7706828 0.9267057 0.1855966 0.8425160 0.8562777 0.9298253 0.5647607 0.8847704
## 200223270003_R03C01 0.7256813 0.8779381 0.6731062 0.9092861 0.8312198 0.3713483 0.07196933 0.8263885 0.7449475 0.6859729 0.8916957 0.4994032 0.6023731 0.2651854 0.5399655 0.8609966
## 200223270003_R06C01 0.8521881 0.6887230 0.6443180 0.5084647 0.8492103 0.4852461 0.09949974 0.2121179 0.7872237 0.6509046 0.8750052 0.3494312 0.8778458 0.8667808 0.5400348 0.8808022
## cg11268585 cg24139837 cg04728936 cg01128042 cg06394820 cg08669168 cg09727210 cg06286533 cg18918831 cg20678988 cg11019791 cg06715136 cg15138543 cg11133939 cg15775217 cg21415084
## 200223270003_R02C01 0.2521544 0.07404605 0.2172057 0.9113420 0.8513195 0.9226769 0.4240111 0.2734841 0.4891660 0.8438718 0.8112324 0.3400192 0.7734778 0.1282694 0.5707441 0.8374415
## 200223270003_R03C01 0.8535791 0.04183445 0.1925451 0.5328806 0.8695521 0.9164547 0.8812928 0.9354924 0.5333801 0.8548886 0.7831231 0.9259109 0.2949313 0.5920898 0.9168327 0.8509420
## 200223270003_R06C01 0.9121931 0.05657120 0.2379376 0.5222757 0.4415020 0.6362087 0.8493743 0.8696546 0.6406575 0.7786685 0.4353250 0.9079807 0.2496147 0.5127706 0.6042521 0.8378237
## cg20208879 cg22071943 cg02372404 cg05891136 cg03327352 cg25879395 cg02356645 cg04540199 cg09584650 cg26642936 cg21783012 cg12702014 cg11540596 cg16180556 cg22542451 cg19097407
## 200223270003_R02C01 0.66986658 0.8705217 0.03598249 0.7797403 0.8851712 0.88130864 0.5105903 0.8165865 0.08230254 0.7619266 0.9142369 0.7704049 0.9238951 0.39300141 0.5884356 0.1417931
## 200223270003_R03C01 0.02423079 0.2442648 0.02767285 0.3310206 0.8786878 0.02603438 0.5833923 0.7964195 0.09661586 0.7023413 0.6694884 0.7848681 0.8926595 0.07312155 0.8337068 0.8367297
## 200223270003_R06C01 0.61769424 0.2644581 0.03127855 0.7965298 0.3042310 0.91060615 0.5701428 0.4698047 0.52399749 0.7099380 0.9070112 0.8065993 0.8820252 0.20051805 0.8125084 0.2276425
## cg06697310 cg04242342 cg05155812 cg26983017 cg00322003 cg11882358 cg05130642 cg04462915 cg17653352 cg20300784 cg07227024 cg03723481 cg26069044 cg06371647 cg12953206 cg01008088
## 200223270003_R02C01 0.8454609 0.8206769 0.4514427 0.89868232 0.1759911 0.89136326 0.8575504 0.03224861 0.9269778 0.86585964 0.04553128 0.4347333 0.9240187 0.8336894 0.2364836 0.8424817
## 200223270003_R03C01 0.8653044 0.8167892 0.9070932 0.03145466 0.5702070 0.04943344 0.8644077 0.50740695 0.9086951 0.86609999 0.05004286 0.9007774 0.9407223 0.8198684 0.2338141 0.2417656
## 200223270003_R06C01 0.2405168 0.8040357 0.4107396 0.84677625 0.3077122 0.80176322 0.3661324 0.02700644 0.9341775 0.03091187 0.06152206 0.8947417 0.9332131 0.8069537 0.6638030 0.2618620
## cg14623940 cg24851651 cg15586958 cg03395511 cg21209485 cg25598710 cg08745107 cg00553601 cg04645024 cg04831745 cg14228103 cg05876883 cg00512739 cg03749159 cg14240646 cg01153376
## 200223270003_R02C01 0.7623774 0.03674702 0.9058263 0.4491605 0.8865053 0.3105752 0.02921338 0.05601299 0.7366541 0.61984995 0.9141064 0.9039064 0.9337648 0.9355921 0.5391334 0.4872148
## 200223270003_R03C01 0.8732905 0.05358297 0.8957526 0.4835967 0.8714878 0.3088142 0.78542320 0.58957701 0.8454827 0.71214149 0.8591302 0.9223308 0.8863895 0.9153921 0.2538363 0.9639670
## 200223270003_R06C01 0.8661720 0.05968923 0.9121763 0.5523959 0.2292550 0.8538820 0.02709928 0.62426500 0.0871902 0.06871768 0.1834348 0.4697980 0.9242748 0.9255807 0.1864902 0.2242410
## cg03600007 cg27577781 cg22169467 cg10993865 cg16089727 cg16536985 cg03129555 cg03549208 cg05161773 cg19377607 cg22666875 cg24634455 cg16655091 cg06012903 cg17061760 cg11187460
## 200223270003_R02C01 0.5658487 0.8143535 0.3095010 0.9173768 0.86748697 0.5789643 0.6079616 0.9014487 0.4120912 0.05377464 0.8177182 0.7796391 0.6055295 0.7964595 0.08726914 0.03672179
## 200223270003_R03C01 0.6018832 0.8113185 0.2978585 0.9096170 0.54996692 0.5418687 0.5785498 0.8381784 0.4154907 0.90570746 0.8291957 0.5188241 0.7053336 0.1933431 0.59377488 0.92516409
## 200223270003_R06C01 0.8611166 0.8144274 0.8955853 0.4904519 0.05876736 0.8392044 0.9137818 0.9097817 0.8526849 0.06636174 0.3694180 0.5325725 0.8724479 0.1960773 0.83354475 0.03109553
## cg06864789 cg25306893 cg01910713 cg01549082 cg03635532 cg02078724 cg09247979 cg03737947 cg10890644 cg04888234 cg12012426 cg00689685 cg17268094 cg17018422 cg00247094 cg02495179
## 200223270003_R02C01 0.05369415 0.6265392 0.8573169 0.2924138 0.8416733 0.3096774 0.5070956 0.91824910 0.1402372 0.8379655 0.9165048 0.7019389 0.5774753 0.5262747 0.5399349 0.6813307
## 200223270003_R03C01 0.46053125 0.8330282 0.8538850 0.7065693 0.8262538 0.2896133 0.5706177 0.92067153 0.1348023 0.4376314 0.9434768 0.8634268 0.9003262 0.9029604 0.9315640 0.7373055
## 200223270003_R06C01 0.87513655 0.6175380 0.8110366 0.2895440 0.8450480 0.2805612 0.5090215 0.03638091 0.1407028 0.8039047 0.9220044 0.6378795 0.8789368 0.5100750 0.5177874 0.5588114
## cg18949721 cg12063064 cg08914944 cg23159970 cg09289202 cg08096656 cg19242610 cg01023242 cg04768387 cg05392160
## 200223270003_R02C01 0.2334245 0.9357515 0.63423942 0.61817246 0.4361103 0.9362594 0.5188218 0.7210683 0.3131047 0.9328933
## 200223270003_R03C01 0.2437792 0.9436901 0.04392811 0.57492600 0.4397504 0.9314878 0.9236389 0.9032685 0.9465814 0.2576881
## 200223270003_R06C01 0.2523095 0.5490657 0.06893322 0.03288909 0.4193555 0.4943033 0.8761320 0.7831190 0.9098563 0.8920726
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
dim(df_selected_Median)
## [1] 554 251
print(Selected_median_imp_Name)
## [1] "age.now" "PC2" "cg27272246" "cg20685672" "cg23432430" "cg00004073" "cg06833284" "cg14710850" "cg03924089" "cg13405878" "cg10240127" "cg14687298" "cg16652920" "cg04248279" "cg24433124"
## [16] "cg12543766" "cg11331837" "cg17129965" "cg02225060" "cg08857872" "cg22741595" "PC1" "cg06961873" "cg09015880" "cg00962106" "cg08198851" "cg26901661" "cg14168080" "cg02981548" "cg11169344"
## [31] "cg26081710" "cg02621446" "cg04412904" "cg06231502" "cg07028768" "cg07104639" "cg08880261" "cg18285382" "cg18819889" "cg26219488" "cg04971651" "cg02631626" "cg20078646" "cg07480955" "cg17042243"
## [46] "cg08861434" "cg00086247" "cg06634367" "cg06483046" "cg10978526" "cg07640670" "cg23517115" "cg07504457" "cg00696044" "cg21812850" "cg17429539" "cg20398163" "cg12228670" "cg14564293" "cg03979311"
## [61] "cg12784167" "cg06115838" "cg07158503" "cg02772171" "cg01921484" "cg22933800" "cg11438323" "cg10039445" "cg18816397" "cg03660162" "cg21243064" "cg23352245" "cg06118351" "cg25208881" "cg14175932"
## [76] "cg16715186" "cg10681981" "cg08788093" "cg26679884" "cg14293999" "cg06546677" "cg00819121" "cg18526121" "cg23066280" "cg23923019" "cg07478795" "cg21139150" "cg15633912" "cg08138245" "cg15098922"
## [91] "cg21392220" "cg06880438" "cg04664583" "cg14507637" "cg21388339" "cg15501526" "cg25366315" "cg10738648" "cg17738613" "cg08779649" "cg16779438" "cg10738049" "cg15600437" "cg10091792" "cg19471911"
## [106] "cg11286989" "cg02887598" "cg12146221" "cg26948066" "cg27086157" "cg26853071" "cg04316537" "cg06960717" "cg13739190" "cg06403901" "cg14582632" "cg21507367" "cg17186592" "cg07634717" "cg09216282"
## [121] "cg22112152" "cg00084271" "cg19301366" "cg00154902" "cg23836570" "cg05234269" "cg19799454" "cg26705599" "cg04718469" "cg05799088" "cg10666341" "cg12333628" "cg15985500" "cg16202259" "cg16771215"
## [136] "cg27160885" "cg12689021" "cg13815695" "cg14307563" "cg25436480" "cg03982462" "cg00767423" "cg12421087" "cg22535849" "cg11268585" "cg24139837" "cg04728936" "cg01128042" "cg06394820" "cg08669168"
## [151] "cg09727210" "cg06286533" "cg18918831" "cg20678988" "cg11019791" "cg06715136" "cg15138543" "cg11133939" "cg15775217" "cg21415084" "cg20208879" "cg22071943" "cg02372404" "cg05891136" "cg03327352"
## [166] "cg25879395" "cg02356645" "cg04540199" "cg09584650" "cg26642936" "cg21783012" "cg12702014" "cg11540596" "cg16180556" "cg22542451" "cg19097407" "cg06697310" "cg04242342" "cg05155812" "cg26983017"
## [181] "cg00322003" "cg11882358" "cg05130642" "cg04462915" "cg17653352" "cg20300784" "cg07227024" "cg03723481" "cg26069044" "cg06371647" "cg12953206" "cg01008088" "cg14623940" "cg24851651" "cg15586958"
## [196] "cg03395511" "cg21209485" "cg25598710" "cg08745107" "cg00553601" "cg04645024" "cg04831745" "cg14228103" "cg05876883" "cg00512739" "cg03749159" "cg14240646" "cg01153376" "cg03600007" "cg27577781"
## [211] "cg22169467" "cg10993865" "cg16089727" "cg16536985" "cg03129555" "cg03549208" "cg05161773" "cg19377607" "cg22666875" "cg24634455" "cg16655091" "cg06012903" "cg17061760" "cg11187460" "cg06864789"
## [226] "cg25306893" "cg01910713" "cg01549082" "cg03635532" "cg02078724" "cg09247979" "cg03737947" "cg10890644" "cg04888234" "cg12012426" "cg00689685" "cg17268094" "cg17018422" "cg00247094" "cg02495179"
## [241] "cg18949721" "cg12063064" "cg08914944" "cg23159970" "cg09289202" "cg08096656" "cg19242610" "cg01023242" "cg04768387" "cg05392160"
print(head(output_median_feature))
## # A tibble: 6 × 251
## DX age.now PC2 cg27272246 cg20685672 cg23432430 cg00004073 cg06833284 cg14710850 cg03924089 cg13405878 cg10240127 cg14687298 cg16652920 cg04248279 cg24433124 cg12543766 cg11331837 cg17129965
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI 82.4 1.47e-2 0.862 0.671 0.948 0.0293 0.913 0.805 0.792 0.455 0.925 0.0421 0.944 0.853 0.132 0.510 0.0369 0.897
## 2 CN 78.6 5.75e-2 0.871 0.793 0.946 0.0279 0.900 0.809 0.737 0.786 0.940 0.148 0.943 0.846 0.599 0.887 0.572 0.881
## 3 CN 80.4 8.37e-2 0.810 0.661 0.942 0.646 0.610 0.829 0.851 0.758 0.906 0.243 0.946 0.833 0.819 0.0282 0.0318 0.886
## 4 MCI 62.9 1.65e-5 0.769 0.0829 0.946 0.412 0.0381 0.850 0.869 0.448 0.926 0.513 0.953 0.597 0.592 0.818 0.930 0.874
## 5 CN 80.7 1.57e-2 0.440 0.845 0.951 0.393 0.915 0.821 0.748 0.340 0.924 0.0362 0.949 0.894 0.574 0.457 0.540 0.882
## 6 MCI 80.6 3.46e-2 0.750 0.657 0.515 0.404 0.901 0.845 0.753 0.734 0.907 0.241 0.949 0.273 0.606 0.804 0.924 0.776
## # ℹ 232 more variables: cg02225060 <dbl>, cg08857872 <dbl>, cg22741595 <dbl>, PC1 <dbl>, cg06961873 <dbl>, cg09015880 <dbl>, cg00962106 <dbl>, cg08198851 <dbl>, cg26901661 <dbl>, cg14168080 <dbl>,
## # cg02981548 <dbl>, cg11169344 <dbl>, cg26081710 <dbl>, cg02621446 <dbl>, cg04412904 <dbl>, cg06231502 <dbl>, cg07028768 <dbl>, cg07104639 <dbl>, cg08880261 <dbl>, cg18285382 <dbl>,
## # cg18819889 <dbl>, cg26219488 <dbl>, cg04971651 <dbl>, cg02631626 <dbl>, cg20078646 <dbl>, cg07480955 <dbl>, cg17042243 <dbl>, cg08861434 <dbl>, cg00086247 <dbl>, cg06634367 <dbl>,
## # cg06483046 <dbl>, cg10978526 <dbl>, cg07640670 <dbl>, cg23517115 <dbl>, cg07504457 <dbl>, cg00696044 <dbl>, cg21812850 <dbl>, cg17429539 <dbl>, cg20398163 <dbl>, cg12228670 <dbl>,
## # cg14564293 <dbl>, cg03979311 <dbl>, cg12784167 <dbl>, cg06115838 <dbl>, cg07158503 <dbl>, cg02772171 <dbl>, cg01921484 <dbl>, cg22933800 <dbl>, cg11438323 <dbl>, cg10039445 <dbl>,
## # cg18816397 <dbl>, cg03660162 <dbl>, cg21243064 <dbl>, cg23352245 <dbl>, cg06118351 <dbl>, cg25208881 <dbl>, cg14175932 <dbl>, cg16715186 <dbl>, cg10681981 <dbl>, cg08788093 <dbl>,
## # cg26679884 <dbl>, cg14293999 <dbl>, cg06546677 <dbl>, cg00819121 <dbl>, cg18526121 <dbl>, cg23066280 <dbl>, cg23923019 <dbl>, cg07478795 <dbl>, cg21139150 <dbl>, cg15633912 <dbl>, …
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- Number_fea_input
combined_importance_freq_ordered_df <- combined_importance_Avg_ordered
df_Selected_Frequency_Imp <- function(n_select_frequencyWay,FeatureImportanceTable){
# In this function, we Input the feature importance data frame,
# And process with the steps we discussed before.
# The output will be the feature frequency Table.
# (i.e. frequency of the appearance of each features based on the Top Number of features selected)
# LRM
## All_impAvg_orderby_LRM
All_impAvg_orderby_LRM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_LRM1),]
## top_impAvg_orderby_LRM
top_impAvg_orderby_LRM <- head(All_impAvg_orderby_LRM,n = n_select_frequencyWay)
top_impAvg_orderby_LRM_NAME <- top_impAvg_orderby_LRM$Feature
# XGB
## All_impAvg_orderby_XGB
All_impAvg_orderby_XGB <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_XGB),]
## top_impAvg_orderby_XGB
top_impAvg_orderby_XGB <- head(All_impAvg_orderby_XGB,n = n_select_frequencyWay)
top_impAvg_orderby_XGB_NAME <- top_impAvg_orderby_XGB$Feature
# ENM
## all_impAvg_orderby_ENM
All_impAvg_orderby_ENM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_ENM1),]
## top_impAvg_orderby_ENM
top_impAvg_orderby_ENM <- head(All_impAvg_orderby_ENM,n = n_select_frequencyWay)
top_impAvg_orderby_ENM_NAME <- top_impAvg_orderby_ENM$Feature
# RF
## all_impAvg_orderby_RF
All_impAvg_orderby_RF <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_RF),]
## top_impAvg_orderby_RF
top_impAvg_orderby_RF <- head(All_impAvg_orderby_RF,n = n_select_frequencyWay)
top_impAvg_orderby_RF_NAME <- top_impAvg_orderby_RF$Feature
# SVM
## all_impAvg_orderby_SVM
All_impAvg_orderby_SVM <- combined_importance_freq_ordered_df[order(-combined_importance_freq_ordered_df$Importance_SVM),]
## top_impAvg_orderby_SVM
top_impAvg_orderby_SVM <- head(All_impAvg_orderby_SVM,n = n_select_frequencyWay)
top_impAvg_orderby_SVM_NAME <- top_impAvg_orderby_SVM$Feature
# Combine all features into a unique collection
all_features <- unique(c(top_impAvg_orderby_LRM_NAME, top_impAvg_orderby_XGB_NAME, top_impAvg_orderby_ENM_NAME,top_impAvg_orderby_RF_NAME,top_impAvg_orderby_SVM_NAME))
models<-c("LRM","XGB","ENM","RF","SVM")
feature_matrix <- matrix(0, nrow = length(all_features), ncol = length(models),
dimnames = list(all_features, models))
# Fill the dataframe indicating presence (1) or absence (0) of each feature in each model
for (feature in all_features) {
feature_matrix[feature, "LRM"] <-
as.integer(feature %in% top_impAvg_orderby_LRM_NAME)
feature_matrix[feature, "XGB"] <-
as.integer(feature %in% top_impAvg_orderby_XGB_NAME)
feature_matrix[feature, "ENM"] <-
as.integer(feature %in% top_impAvg_orderby_ENM_NAME)
feature_matrix[feature, "RF"] <-
as.integer(feature %in% top_impAvg_orderby_RF_NAME)
feature_matrix[feature, "SVM"] <-
as.integer(feature %in% top_impAvg_orderby_SVM_NAME)
}
# Convert the matrix to a data frame
feature_df <- as.data.frame(feature_matrix)
feature_df$Total_Count <- rowSums(feature_df[,1:5])
# Sort the dataframe by the Total_Count in descending order
feature_df <- feature_df[order(-feature_df$Total_Count), ]
print(feature_df)
return(feature_df)
}
Now, the function will be tested below:
df_Func_test<-df_Selected_Frequency_Imp(NUM_COMMON_FEATURES_SET_Frequency,combined_importance_freq_ordered_df)
## LRM XGB ENM RF SVM Total_Count
## cg27272246 1 1 1 0 1 4
## cg14687298 1 1 1 0 1 4
## cg14710850 1 0 1 0 1 3
## cg20685672 1 1 1 0 0 3
## cg16652920 1 1 1 0 0 3
## cg24433124 1 0 1 0 1 3
## cg12543766 1 1 1 0 0 3
## age.now 0 1 0 1 1 3
## PC2 1 0 1 0 0 2
## cg23432430 1 0 1 0 0 2
## cg00004073 1 0 0 1 0 2
## cg13405878 1 0 1 0 0 2
## cg02981548 1 0 1 0 0 2
## cg03924089 1 0 1 0 0 2
## cg07480955 1 0 1 0 0 2
## cg17129965 1 0 1 0 0 2
## cg11331837 0 1 0 0 1 2
## cg04412904 0 1 0 1 0 2
## cg10240127 0 1 0 0 1 2
## cg08880261 0 1 0 1 0 2
## cg14168080 0 0 0 1 1 2
## cg14582632 1 0 0 0 0 1
## cg08788093 1 0 0 0 0 1
## cg21243064 1 0 0 0 0 1
## cg02225060 1 0 0 0 0 1
## cg19471911 1 0 0 0 0 1
## cg22535849 0 1 0 0 0 1
## cg19799454 0 1 0 0 0 1
## cg06961873 0 1 0 0 0 1
## cg23836570 0 1 0 0 0 1
## cg20078646 0 1 0 0 0 1
## cg04316537 0 1 0 0 0 1
## cg18285382 0 1 0 0 0 1
## cg04718469 0 1 0 0 0 1
## PC1 0 1 0 0 0 1
## cg02621446 0 1 0 0 0 1
## cg07028768 0 0 1 0 0 1
## cg00962106 0 0 1 0 0 1
## cg09015880 0 0 1 0 0 1
## cg00086247 0 0 1 0 0 1
## cg06833284 0 0 1 0 0 1
## cg06634367 0 0 1 0 0 1
## cg06286533 0 0 0 1 0 1
## cg03749159 0 0 0 1 0 1
## cg18526121 0 0 0 1 0 1
## cg24851651 0 0 0 1 0 1
## cg23658987 0 0 0 1 0 1
## cg26081710 0 0 0 1 0 1
## cg27086157 0 0 0 1 0 1
## cg15501526 0 0 0 1 0 1
## cg10864200 0 0 0 1 0 1
## cg02464073 0 0 0 1 0 1
## cg09289202 0 0 0 1 0 1
## cg18819889 0 0 0 1 0 1
## cg12228670 0 0 0 1 0 1
## cg27160885 0 0 0 1 0 1
## cg03979311 0 0 0 1 0 1
## cg05161773 0 0 0 0 1 1
## cg26705599 0 0 0 0 1 1
## cg12776173 0 0 0 0 1 1
## cg07640670 0 0 0 0 1 1
## cg00767423 0 0 0 0 1 1
## cg10978526 0 0 0 0 1 1
## cg06546677 0 0 0 0 1 1
## cg26901661 0 0 0 0 1 1
## cg02772171 0 0 0 0 1 1
## cg06115838 0 0 0 0 1 1
## cg07478795 0 0 0 0 1 1
## cg12784167 0 0 0 0 1 1
# The expected output should be zero.
sum(df_Func_test!=frequency_feature_df_RAW_ordered)
## [1] 0
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case)
The frequency / common feature importance is processed in the following:
n_select_frequencyWay <- Number_fea_input
df_feature_Output_frequency <- df_Selected_Frequency_Imp(Number_fea_input,
combined_importance_freq_ordered_df)
## LRM XGB ENM RF SVM Total_Count
## cg27272246 1 1 1 1 1 5
## cg14710850 1 1 1 1 1 5
## cg00004073 1 1 1 1 1 5
## cg13405878 1 1 1 1 1 5
## cg02981548 1 1 1 1 1 5
## cg20685672 1 1 1 1 1 5
## cg08788093 1 1 1 1 1 5
## cg03924089 1 1 1 1 1 5
## cg16652920 1 1 1 1 1 5
## cg24433124 1 1 1 1 1 5
## cg12543766 1 1 1 1 1 5
## cg14687298 1 1 1 1 1 5
## cg17129965 1 1 1 1 1 5
## cg06833284 1 1 1 1 1 5
## cg11169344 1 1 1 1 1 5
## cg14168080 1 1 1 1 1 5
## cg26081710 1 1 1 1 1 5
## cg02631626 1 1 1 1 1 5
## cg17042243 1 1 1 1 1 5
## cg08861434 1 1 1 1 1 5
## cg07640670 1 1 1 1 1 5
## cg20398163 1 1 1 1 1 5
## cg03979311 1 1 1 1 1 5
## cg10978526 1 1 1 1 1 5
## cg22933800 1 1 1 1 1 5
## cg07028768 1 1 1 1 1 5
## cg04248279 1 1 1 1 1 5
## cg07504457 1 1 1 1 1 5
## cg14175932 1 1 1 1 1 5
## cg26219488 1 1 1 1 1 5
## cg06231502 1 1 1 1 1 5
## cg06115838 1 1 1 1 1 5
## cg06483046 1 1 1 1 1 5
## cg07104639 1 1 1 1 1 5
## cg21392220 1 1 1 1 1 5
## cg18819889 1 1 1 1 1 5
## cg08779649 1 1 1 1 1 5
## cg08198851 1 1 1 1 1 5
## cg17186592 1 1 1 1 1 5
## cg06546677 1 1 1 1 1 5
## cg26705599 1 1 1 1 1 5
## cg23517115 1 1 1 1 1 5
## cg00819121 1 1 1 1 1 5
## cg11268585 1 1 1 1 1 5
## cg13815695 1 1 1 1 1 5
## cg06286533 1 1 1 1 1 5
## cg23352245 1 1 1 1 1 5
## cg26679884 1 1 1 1 1 5
## cg18816397 1 1 1 1 1 5
## cg15633912 1 1 1 1 1 5
## cg10681981 1 1 1 1 1 5
## cg01128042 1 1 1 1 1 5
## cg02772171 1 1 1 1 1 5
## cg06394820 1 1 1 1 1 5
## cg17738613 1 1 1 1 1 5
## cg04242342 1 1 1 1 1 5
## cg22741595 1 1 1 1 1 5
## cg02887598 1 1 1 1 1 5
## cg22071943 1 1 1 1 1 5
## cg04971651 1 1 1 1 1 5
## cg19097407 1 1 1 1 1 5
## cg15586958 1 1 1 1 1 5
## cg08857872 1 1 1 1 1 5
## cg02621446 1 1 1 1 1 5
## cg00553601 1 1 1 1 1 5
## cg00767423 1 1 1 1 1 5
## cg18285382 1 1 1 1 1 5
## cg15098922 1 1 1 1 1 5
## cg15138543 1 1 1 1 1 5
## cg08745107 1 1 1 1 1 5
## cg00696044 1 1 1 1 1 5
## cg03749159 1 1 1 1 1 5
## cg21415084 1 1 1 1 1 5
## cg05155812 1 1 1 1 1 5
## cg22112152 1 1 1 1 1 5
## cg14293999 1 1 1 1 1 5
## cg16655091 1 1 1 1 1 5
## cg03327352 1 1 1 1 1 5
## cg06961873 1 1 1 1 1 5
## cg08880261 1 1 1 1 1 5
## cg11286989 1 1 1 1 1 5
## cg06118351 1 1 1 1 1 5
## cg25366315 1 1 1 1 1 5
## cg26853071 1 1 1 1 1 5
## cg25436480 1 1 1 1 1 5
## cg26983017 1 1 1 1 1 5
## cg26901661 1 1 1 1 1 5
## cg21139150 1 1 1 1 1 5
## cg10738049 1 1 1 1 1 5
## cg03129555 1 1 1 1 1 5
## cg17018422 1 1 1 1 1 5
## cg04664583 1 1 1 1 1 5
## cg03660162 1 1 1 1 1 5
## cg15600437 1 1 1 1 1 5
## cg00322003 1 1 1 1 1 5
## cg11331837 1 1 1 1 1 5
## cg23066280 1 1 1 1 1 5
## cg11187460 1 1 1 1 1 5
## cg23159970 1 1 1 1 1 5
## cg14228103 1 1 1 1 1 5
## cg09584650 1 1 1 1 1 5
## cg04728936 1 1 1 1 1 5
## cg14507637 1 1 1 1 1 5
## cg23658987 1 1 1 1 1 5
## cg11019791 1 1 1 1 1 5
## cg05392160 1 1 1 1 1 5
## cg04412904 1 1 1 1 1 5
## cg25306893 1 1 1 1 1 5
## cg21812850 1 1 1 1 1 5
## cg24139837 1 1 1 1 1 5
## cg03737947 1 1 1 1 1 5
## cg12953206 1 1 1 1 1 5
## cg12702014 1 1 1 1 1 5
## cg20300784 1 1 1 1 1 5
## cg17429539 1 1 1 1 1 5
## cg05234269 1 1 1 1 1 5
## cg02356645 1 1 1 1 1 5
## cg03723481 1 1 1 1 1 5
## cg14564293 1 1 1 1 1 5
## cg25598710 1 1 1 1 1 5
## cg21507367 1 1 1 1 1 5
## cg04831745 1 1 1 1 1 5
## cg18526121 1 1 1 1 1 5
## cg04462915 1 1 1 1 1 5
## cg21783012 1 1 1 1 1 5
## cg16089727 1 1 1 1 1 5
## PC1 1 1 1 1 1 5
## cg02372404 1 1 1 1 1 5
## cg12228670 1 1 1 1 1 5
## cg05570109 1 1 1 1 1 5
## cg12784167 1 1 1 1 1 5
## cg10240127 1 1 1 1 1 5
## PC2 1 1 1 1 0 4
## cg23432430 1 1 1 1 0 4
## cg21243064 1 1 1 1 0 4
## cg02225060 1 1 1 1 0 4
## cg07480955 1 1 1 1 0 4
## cg19471911 1 1 1 0 1 4
## cg00962106 1 1 1 0 1 4
## cg09015880 1 1 1 0 1 4
## cg07158503 1 1 1 0 1 4
## cg00086247 1 1 1 1 0 4
## cg06634367 1 1 1 1 0 4
## cg16715186 1 1 1 0 1 4
## cg23923019 1 1 1 0 1 4
## cg11438323 1 1 1 0 1 4
## cg08138245 1 1 1 0 1 4
## cg13739190 1 1 1 0 1 4
## cg10091792 1 1 1 1 0 4
## cg12146221 1 1 1 1 0 4
## cg06403901 1 0 1 1 1 4
## cg20078646 1 1 1 0 1 4
## cg00084271 1 1 1 0 1 4
## cg18918831 1 1 1 0 1 4
## cg01921484 1 1 1 0 1 4
## cg23836570 1 1 1 1 0 4
## cg05130642 1 1 0 1 1 4
## cg19799454 1 1 1 1 0 4
## cg03395511 1 1 0 1 1 4
## cg09727210 1 1 1 0 1 4
## cg00154902 1 1 1 0 1 4
## cg22169467 1 0 1 1 1 4
## cg26069044 1 1 1 0 1 4
## cg17653352 1 1 1 0 1 4
## cg10738648 1 0 1 1 1 4
## cg22666875 1 1 1 0 1 4
## [ reached 'max' / getOption("max.print") -- omitted 154 rows ]
all_out_features <- union(combined_importance_freq_ordered_df$Feature, rownames(df_feature_Output_frequency))
# please note that the combined we use is the one before filtering
# Combine then based on common feature selection method
# if the feature in previous importance feature is not here, then we add the feature and make the value to zero.
feature_output_df_full <- data.frame(Feature = all_out_features)
feature_output_df_full <- merge(feature_output_df_full, df_feature_Output_frequency, by.x = "Feature", by.y = "row.names", all.x = TRUE)
feature_output_df_full[is.na(feature_output_df_full)] <- 0
# For top_impAvg_ordered
all_output_impAvg_ordered_full <- data.frame(Feature = all_out_features)
all_output_impAvg_ordered_full <- merge(combined_importance_freq_ordered_df,
all_output_impAvg_ordered_full,
by.x = "Feature",
by.y = "Feature",
all.x = TRUE)
all_output_impAvg_ordered_full[is.na(all_output_impAvg_ordered_full)] <- 0
all_Output_combined_df_impAvg <- merge(feature_output_df_full,
all_output_impAvg_ordered_full,
by = "Feature",
all = TRUE)
print(head(feature_output_df_full))
## Feature LRM XGB ENM RF SVM Total_Count
## 1 age.now 0 1 0 1 1 3
## 2 cg00004073 1 1 1 1 1 5
## 3 cg00084271 1 1 1 0 1 4
## 4 cg00086247 1 1 1 1 0 4
## 5 cg00154902 1 1 1 0 1 4
## 6 cg00247094 0 0 1 0 1 2
print(head(all_output_impAvg_ordered_full))
## Feature Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0.01043952 0.7697314 0.003919221 1.0000000 0.7142857 0.4996752
## 2 cg00004073 0.55107624 0.0000000 0.444329028 0.6879460 0.5714286 0.4509560
## 3 cg00084271 0.28081441 0.3024610 0.225486534 0.2954690 0.4285714 0.3065605
## 4 cg00086247 0.41379132 0.1954000 0.558326182 0.4872834 0.0000000 0.3309602
## 5 cg00154902 0.23356839 0.0192178 0.334486044 0.2903668 0.4285714 0.2612421
## 6 cg00247094 0.00000000 0.0000000 0.192524686 0.3183556 0.4285714 0.1878903
print(head(all_Output_combined_df_impAvg))
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0 1 0 1 1 3 0.01043952 0.7697314 0.003919221 1.0000000 0.7142857 0.4996752
## 2 cg00004073 1 1 1 1 1 5 0.55107624 0.0000000 0.444329028 0.6879460 0.5714286 0.4509560
## 3 cg00084271 1 1 1 0 1 4 0.28081441 0.3024610 0.225486534 0.2954690 0.4285714 0.3065605
## 4 cg00086247 1 1 1 1 0 4 0.41379132 0.1954000 0.558326182 0.4872834 0.0000000 0.3309602
## 5 cg00154902 1 1 1 0 1 4 0.23356839 0.0192178 0.334486044 0.2903668 0.4285714 0.2612421
## 6 cg00247094 0 0 1 0 1 2 0.00000000 0.0000000 0.192524686 0.3183556 0.4285714 0.1878903
choose the mutual importance feature when it exist at least half number of model’s (i.e 3 in our case) top selected number of important features list.
if(METHOD_FEATURE_FLAG == 6){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m6_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m6[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG == 5){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m5_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m5[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
## # A tibble: 6 × 276
## DX cg27272246 cg14710850 cg00004073 cg13405878 cg02981548 cg20685672 cg08788093 cg03924089 cg16652920 cg24433124 cg12543766 cg14687298 cg17129965 cg06833284 cg11169344 cg14168080 cg26081710
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI 0.862 0.805 0.0293 0.455 0.134 0.671 0.0391 0.792 0.944 0.132 0.510 0.0421 0.897 0.913 0.672 0.419 0.875
## 2 CN 0.871 0.809 0.0279 0.786 0.522 0.793 0.609 0.737 0.943 0.599 0.887 0.148 0.881 0.900 0.822 0.442 0.920
## 3 CN 0.810 0.829 0.646 0.758 0.510 0.661 0.884 0.851 0.946 0.819 0.0282 0.243 0.886 0.610 0.594 0.436 0.880
## 4 MCI 0.769 0.850 0.412 0.448 0.568 0.0829 0.522 0.869 0.953 0.592 0.818 0.513 0.874 0.0381 0.868 0.946 0.917
## 5 CN 0.440 0.821 0.393 0.340 0.508 0.845 0.434 0.748 0.949 0.574 0.457 0.0362 0.882 0.915 0.155 0.399 0.923
## 6 MCI 0.750 0.845 0.404 0.734 0.530 0.657 0.773 0.753 0.949 0.606 0.804 0.241 0.776 0.901 0.623 0.950 0.882
## # ℹ 258 more variables: cg02631626 <dbl>, cg17042243 <dbl>, cg08861434 <dbl>, cg07640670 <dbl>, cg20398163 <dbl>, cg03979311 <dbl>, cg10978526 <dbl>, cg22933800 <dbl>, cg07028768 <dbl>,
## # cg04248279 <dbl>, cg07504457 <dbl>, cg14175932 <dbl>, cg26219488 <dbl>, cg06231502 <dbl>, cg06115838 <dbl>, cg06483046 <dbl>, cg07104639 <dbl>, cg21392220 <dbl>, cg18819889 <dbl>,
## # cg08779649 <dbl>, cg08198851 <dbl>, cg17186592 <dbl>, cg06546677 <dbl>, cg26705599 <dbl>, cg23517115 <dbl>, cg00819121 <dbl>, cg11268585 <dbl>, cg13815695 <dbl>, cg06286533 <dbl>,
## # cg23352245 <dbl>, cg26679884 <dbl>, cg18816397 <dbl>, cg15633912 <dbl>, cg10681981 <dbl>, cg01128042 <dbl>, cg02772171 <dbl>, cg06394820 <dbl>, cg17738613 <dbl>, cg04242342 <dbl>,
## # cg22741595 <dbl>, cg02887598 <dbl>, cg22071943 <dbl>, cg04971651 <dbl>, cg19097407 <dbl>, cg15586958 <dbl>, cg08857872 <dbl>, cg02621446 <dbl>, cg00553601 <dbl>, cg00767423 <dbl>,
## # cg18285382 <dbl>, cg15098922 <dbl>, cg15138543 <dbl>, cg08745107 <dbl>, cg00696044 <dbl>, cg03749159 <dbl>, cg21415084 <dbl>, cg05155812 <dbl>, cg22112152 <dbl>, cg14293999 <dbl>,
## # cg16655091 <dbl>, cg03327352 <dbl>, cg06961873 <dbl>, cg08880261 <dbl>, cg11286989 <dbl>, cg06118351 <dbl>, cg25366315 <dbl>, cg26853071 <dbl>, cg25436480 <dbl>, cg26983017 <dbl>, …
## [1] "The number of final used features of common importance method: 275"
## [1] "cg27272246" "cg14710850" "cg00004073" "cg13405878" "cg02981548" "cg20685672" "cg08788093" "cg03924089" "cg16652920" "cg24433124" "cg12543766" "cg14687298" "cg17129965" "cg06833284" "cg11169344"
## [16] "cg14168080" "cg26081710" "cg02631626" "cg17042243" "cg08861434" "cg07640670" "cg20398163" "cg03979311" "cg10978526" "cg22933800" "cg07028768" "cg04248279" "cg07504457" "cg14175932" "cg26219488"
## [31] "cg06231502" "cg06115838" "cg06483046" "cg07104639" "cg21392220" "cg18819889" "cg08779649" "cg08198851" "cg17186592" "cg06546677" "cg26705599" "cg23517115" "cg00819121" "cg11268585" "cg13815695"
## [46] "cg06286533" "cg23352245" "cg26679884" "cg18816397" "cg15633912" "cg10681981" "cg01128042" "cg02772171" "cg06394820" "cg17738613" "cg04242342" "cg22741595" "cg02887598" "cg22071943" "cg04971651"
## [61] "cg19097407" "cg15586958" "cg08857872" "cg02621446" "cg00553601" "cg00767423" "cg18285382" "cg15098922" "cg15138543" "cg08745107" "cg00696044" "cg03749159" "cg21415084" "cg05155812" "cg22112152"
## [76] "cg14293999" "cg16655091" "cg03327352" "cg06961873" "cg08880261" "cg11286989" "cg06118351" "cg25366315" "cg26853071" "cg25436480" "cg26983017" "cg26901661" "cg21139150" "cg10738049" "cg03129555"
## [91] "cg17018422" "cg04664583" "cg03660162" "cg15600437" "cg00322003" "cg11331837" "cg23066280" "cg11187460" "cg23159970" "cg14228103" "cg09584650" "cg04728936" "cg14507637" "cg23658987" "cg11019791"
## [106] "cg05392160" "cg04412904" "cg25306893" "cg21812850" "cg24139837" "cg03737947" "cg12953206" "cg12702014" "cg20300784" "cg17429539" "cg05234269" "cg02356645" "cg03723481" "cg14564293" "cg25598710"
## [121] "cg21507367" "cg04831745" "cg18526121" "cg04462915" "cg21783012" "cg16089727" "PC1" "cg02372404" "cg12228670" "cg05570109" "cg12784167" "cg10240127" "PC2" "cg23432430" "cg21243064"
## [136] "cg02225060" "cg07480955" "cg19471911" "cg00962106" "cg09015880" "cg07158503" "cg00086247" "cg06634367" "cg16715186" "cg23923019" "cg11438323" "cg08138245" "cg13739190" "cg10091792" "cg12146221"
## [151] "cg06403901" "cg20078646" "cg00084271" "cg18918831" "cg01921484" "cg23836570" "cg05130642" "cg19799454" "cg03395511" "cg09727210" "cg00154902" "cg22169467" "cg26069044" "cg17653352" "cg10738648"
## [166] "cg22666875" "cg17268094" "cg05876883" "cg16779438" "cg09289202" "cg22542451" "cg16771215" "cg04316537" "cg25879395" "cg18150287" "cg27160885" "cg08514194" "cg14623940" "cg22535849" "cg06715136"
## [181] "cg09247979" "cg15775217" "cg19377607" "cg15535896" "cg08669168" "cg01549082" "cg07634717" "cg12063064" "cg24883219" "cg13573375" "cg06960717" "cg05850457" "cg19301366" "cg06864789" "cg14240646"
## [196] "cg03635532" "cg18993517" "cg27577781" "cg25649515" "cg00939409" "cg07478795" "cg06371647" "cg04645024" "cg18949721" "cg09216282" "cg06697310" "cg21388339" "cg15501526" "cg27086157" "cg21209485"
## [211] "cg20208879" "cg12012426" "cg01023242" "cg11401796" "cg02246922" "cg10039445" "cg26948066" "cg05891136" "cg12776173" "cg08914944" "cg14582632" "cg05799088" "cg11540596" "cg16536985" "cg03549208"
## [226] "cg10666341" "cg01008088" "cg03600007" "cg16180556" "cg15985500" "cg03982462" "cg02550738" "cg10993865" "cg14192979" "cg07227024" "cg09708852" "cg16571124" "cg18857647" "cg16202259" "cg12421087"
## [241] "cg03796003" "cg10788927" "cg15491125" "cg06880438" "cg12501287" "cg24470466" "cg16405337" "cg00272795" "cg04875706" "cg11882358" "cg14307563" "cg06012903" "cg11227702" "age.now" "cg04718469"
## [256] "cg25208881" "cg00689685" "cg12333628" "cg11133939" "cg01933473" "cg11314779" "cg24634455" "cg05161773" "cg02464073" "cg04768387" "cg24851651" "cg01910713" "cg14649234" "cg08896901" "cg03088219"
## [271] "cg26642936" "cg01153376" "cg17061760" "cg04888234" "cg09785377"
## DX cg27272246 cg14710850 cg00004073 cg13405878 cg02981548 cg20685672 cg08788093 cg03924089 cg16652920 cg24433124 cg12543766 cg14687298 cg17129965 cg06833284 cg11169344 cg14168080
## 200223270003_R02C01 MCI 0.8615873 0.8048592 0.02928535 0.4549662 0.1342571 0.6712101 0.03911678 0.7920449 0.9436000 0.1316610 0.51028134 0.04206702 0.8972140 0.9125144 0.6720163 0.4190123
## 200223270003_R03C01 CN 0.8705287 0.8090950 0.02787198 0.7858042 0.5220037 0.7932091 0.60934160 0.7370283 0.9431222 0.5987648 0.88741539 0.14813581 0.8806673 0.9003482 0.8215477 0.4420256
## 200223270003_R06C01 CN 0.8103777 0.8285902 0.64576463 0.7583938 0.5098965 0.6613646 0.88380243 0.8506756 0.9457161 0.8188082 0.02818501 0.24260002 0.8857237 0.6097933 0.5941114 0.4355521
## cg26081710 cg02631626 cg17042243 cg08861434 cg07640670 cg20398163 cg03979311 cg10978526 cg22933800 cg07028768 cg04248279 cg07504457 cg14175932 cg26219488 cg06231502 cg06115838
## 200223270003_R02C01 0.8751040 0.6280766 0.2502905 0.8768306 0.58296513 0.1728144 0.86644909 0.5671930 0.4830774 0.4496851 0.8534976 0.7116230 0.5746953 0.9336638 0.7784451 0.8847724
## 200223270003_R03C01 0.9198212 0.1951736 0.2933475 0.4352647 0.55225610 0.8728944 0.06199853 0.9095713 0.4142525 0.8536078 0.8458854 0.6854539 0.8779027 0.9134707 0.7964278 0.8447916
## 200223270003_R06C01 0.8801892 0.2699849 0.2725457 0.8698813 0.04058533 0.2623391 0.72615553 0.8945157 0.3956683 0.8356936 0.8332786 0.7205633 0.7288239 0.9261878 0.7706160 0.8805585
## cg06483046 cg07104639 cg21392220 cg18819889 cg08779649 cg08198851 cg17186592 cg06546677 cg26705599 cg23517115 cg00819121 cg11268585 cg13815695 cg06286533 cg23352245 cg26679884
## 200223270003_R02C01 0.04383925 0.6772717 0.8726204 0.9156157 0.44449401 0.6578905 0.9230463 0.4472216 0.8585917 0.2151144 0.9207001 0.2521544 0.9267057 0.2734841 0.9377232 0.6793815
## 200223270003_R03C01 0.50720277 0.7123879 0.8563905 0.9004455 0.45076825 0.6578186 0.8593448 0.8484609 0.8613854 0.9131440 0.9281472 0.8535791 0.6859729 0.9354924 0.9375774 0.1848705
## 200223270003_R06C01 0.89604910 0.8099688 0.8466199 0.9054439 0.04810217 0.1272153 0.8467599 0.5636023 0.4332832 0.8328364 0.9327211 0.9121931 0.6509046 0.8696546 0.5932742 0.1701734
## cg18816397 cg15633912 cg10681981 cg01128042 cg02772171 cg06394820 cg17738613 cg04242342 cg22741595 cg02887598 cg22071943 cg04971651 cg19097407 cg15586958 cg08857872 cg02621446
## 200223270003_R02C01 0.5472925 0.1605530 0.7035090 0.9113420 0.9182018 0.8513195 0.6879612 0.8206769 0.6525533 0.04020908 0.8705217 0.8902474 0.1417931 0.9058263 0.3395280 0.8731313
## 200223270003_R03C01 0.4940355 0.9333421 0.7382662 0.5328806 0.5660559 0.8695521 0.6582258 0.8167892 0.1730013 0.67073881 0.2442648 0.9219452 0.8367297 0.8957526 0.8181845 0.8095534
## 200223270003_R06C01 0.5337018 0.8737362 0.6971989 0.5222757 0.8995479 0.4415020 0.1022257 0.8040357 0.1550739 0.73408417 0.2644581 0.9035233 0.2276425 0.9121763 0.2970779 0.7511582
## cg00553601 cg00767423 cg18285382 cg15098922 cg15138543 cg08745107 cg00696044 cg03749159 cg21415084 cg05155812 cg22112152 cg14293999 cg16655091 cg03327352 cg06961873 cg08880261
## 200223270003_R02C01 0.05601299 0.9298253 0.3202927 0.9286092 0.7734778 0.02921338 0.55608424 0.9355921 0.8374415 0.4514427 0.8476101 0.2836710 0.6055295 0.8851712 0.5335591 0.40655904
## 200223270003_R03C01 0.58957701 0.2651854 0.2930577 0.9027517 0.2949313 0.78542320 0.07552381 0.9153921 0.8509420 0.9070932 0.8014136 0.9172023 0.7053336 0.8786878 0.5472606 0.85616966
## 200223270003_R06C01 0.62426500 0.8667808 0.8923595 0.8525611 0.2496147 0.02709928 0.79270858 0.9255807 0.8378237 0.4107396 0.7897897 0.9168166 0.8724479 0.3042310 0.9415177 0.03280808
## cg11286989 cg06118351 cg25366315 cg26853071 cg25436480 cg26983017 cg26901661 cg21139150 cg10738049 cg03129555 cg17018422 cg04664583 cg03660162 cg15600437 cg00322003 cg11331837
## 200223270003_R02C01 0.7590008 0.3633940 0.9182318 0.4233820 0.8425160 0.89868232 0.8951971 0.01853264 0.5441211 0.6079616 0.5262747 0.5572814 0.8691767 0.4885353 0.1759911 0.03692842
## 200223270003_R03C01 0.8533989 0.4714860 0.9209800 0.7451354 0.4994032 0.03145466 0.8754981 0.43223243 0.5232715 0.5785498 0.9029604 0.5881190 0.5160770 0.4894487 0.5702070 0.57150125
## 200223270003_R06C01 0.7313884 0.8655962 0.8972984 0.4228079 0.3494312 0.84677625 0.9021064 0.43772680 0.4875473 0.9137818 0.5100750 0.9352717 0.9026304 0.8551374 0.3077122 0.03182862
## cg23066280 cg11187460 cg23159970 cg14228103 cg09584650 cg04728936 cg14507637 cg23658987 cg11019791 cg05392160 cg04412904 cg25306893 cg21812850 cg24139837 cg03737947 cg12953206
## 200223270003_R02C01 0.07247841 0.03672179 0.61817246 0.9141064 0.08230254 0.2172057 0.9051258 0.79757644 0.8112324 0.9328933 0.05088595 0.6265392 0.7920645 0.07404605 0.91824910 0.2364836
## 200223270003_R03C01 0.57174588 0.92516409 0.57492600 0.8591302 0.09661586 0.1925451 0.9009460 0.07511718 0.7831231 0.2576881 0.07717659 0.8330282 0.7688711 0.04183445 0.92067153 0.2338141
## 200223270003_R06C01 0.80814756 0.03109553 0.03288909 0.1834348 0.52399749 0.2379376 0.9013686 0.10177571 0.4353250 0.8920726 0.08253743 0.6175380 0.7702792 0.05657120 0.03638091 0.6638030
## cg12702014 cg20300784 cg17429539 cg05234269 cg02356645 cg03723481 cg14564293 cg25598710 cg21507367 cg04831745 cg18526121 cg04462915 cg21783012 cg16089727 PC1 cg02372404
## 200223270003_R02C01 0.7704049 0.86585964 0.7860900 0.93848584 0.5105903 0.4347333 0.52089591 0.3105752 0.9268560 0.61984995 0.4519781 0.03224861 0.9142369 0.86748697 -0.214185447 0.03598249
## 200223270003_R03C01 0.7848681 0.86609999 0.7100923 0.57461229 0.5833923 0.9007774 0.04000662 0.3088142 0.9290102 0.71214149 0.4762313 0.50740695 0.6694884 0.54996692 -0.172761185 0.02767285
## 200223270003_R06C01 0.8065993 0.03091187 0.7660838 0.02467208 0.5701428 0.8947417 0.04959460 0.8538820 0.9039559 0.06871768 0.4833367 0.02700644 0.9070112 0.05876736 -0.003667305 0.03127855
## cg12228670 cg05570109 cg12784167 cg10240127 PC2 cg23432430 cg21243064 cg02225060 cg07480955 cg19471911 cg00962106 cg09015880 cg07158503 cg00086247 cg06634367 cg16715186
## 200223270003_R02C01 0.8632174 0.3466611 0.81503498 0.9250553 0.01470293 0.9482702 0.5191606 0.6828159 0.3874638 0.6334393 0.9124898 0.5101716 0.5777146 0.1761275 0.8695793 0.2742789
## 200223270003_R03C01 0.8496212 0.5866750 0.02811410 0.9403255 0.05745834 0.9455418 0.9167649 0.8265195 0.3916889 0.8437175 0.5375751 0.8402106 0.6203543 0.2045043 0.9512930 0.7946153
## 200223270003_R06C01 0.8738949 0.4046471 0.03073269 0.9056974 0.08372861 0.9418716 0.4862205 0.5209552 0.4043390 0.6127952 0.5040948 0.8472063 0.6236025 0.6901217 0.9544163 0.8124316
## cg23923019 cg11438323 cg08138245 cg13739190 cg10091792 cg12146221 cg06403901 cg20078646 cg00084271 cg18918831 cg01921484 cg23836570 cg05130642 cg19799454 cg03395511 cg09727210
## 200223270003_R02C01 0.8555018 0.4863471 0.8115760 0.8510103 0.8670733 0.2049284 0.92790690 0.06198170 0.8103611 0.4891660 0.9098550 0.58688450 0.8575504 0.9178930 0.4491605 0.4240111
## 200223270003_R03C01 0.3058914 0.8984559 0.1109940 0.8358482 0.5864221 0.1814927 0.04783341 0.89537412 0.7877006 0.5333801 0.9093137 0.54259383 0.8644077 0.9106247 0.4835967 0.8812928
## 200223270003_R06C01 0.8108207 0.8722772 0.7444698 0.8419471 0.6087997 0.8619250 0.05253626 0.08725521 0.7706165 0.6406575 0.9204487 0.03267304 0.3661324 0.9066551 0.5523959 0.8493743
## cg00154902 cg22169467 cg26069044 cg17653352 cg10738648 cg22666875 cg17268094 cg05876883 cg16779438 cg09289202 cg22542451 cg16771215 cg04316537 cg25879395 cg18150287 cg27160885
## 200223270003_R02C01 0.5137741 0.3095010 0.9240187 0.9269778 0.44931577 0.8177182 0.5774753 0.9039064 0.8826150 0.4361103 0.5884356 0.88389723 0.8074830 0.88130864 0.7685695 0.2231606
## 200223270003_R03C01 0.8540746 0.2978585 0.9407223 0.9086951 0.49894016 0.8291957 0.9003262 0.9223308 0.5466924 0.4397504 0.8337068 0.07196933 0.8453340 0.02603438 0.7519166 0.8263885
## 200223270003_R06C01 0.8188126 0.8955853 0.9332131 0.9341775 0.05552024 0.3694180 0.8789368 0.4697980 0.8629492 0.4193555 0.8125084 0.09949974 0.4351695 0.91060615 0.2501173 0.2121179
## cg08514194 cg14623940 cg22535849 cg06715136 cg09247979 cg15775217 cg19377607 cg15535896 cg08669168 cg01549082 cg07634717 cg12063064 cg24883219 cg13573375 cg06960717 cg05850457
## 200223270003_R02C01 0.9128478 0.7623774 0.8847704 0.3400192 0.5070956 0.5707441 0.05377464 0.3382952 0.9226769 0.2924138 0.7483382 0.9357515 0.6430473 0.8670419 0.7030978 0.8183013
## 200223270003_R03C01 0.2613138 0.8732905 0.8609966 0.9259109 0.5706177 0.9168327 0.90570746 0.9253926 0.9164547 0.7065693 0.8254434 0.9436901 0.6822115 0.1733934 0.7653402 0.8313023
## 200223270003_R06C01 0.9202187 0.8661720 0.8808022 0.9079807 0.5090215 0.6042521 0.06636174 0.3320191 0.6362087 0.2895440 0.8181246 0.5490657 0.5296903 0.8888246 0.7206218 0.8161364
## cg19301366 cg06864789 cg14240646 cg03635532 cg18993517 cg27577781 cg25649515 cg00939409 cg07478795 cg06371647 cg04645024 cg18949721 cg09216282 cg06697310 cg21388339 cg15501526
## 200223270003_R02C01 0.8831393 0.05369415 0.5391334 0.8416733 0.2091538 0.8143535 0.9279829 0.2652180 0.8911007 0.8336894 0.7366541 0.2334245 0.9349248 0.8454609 0.2756268 0.6362531
## 200223270003_R03C01 0.8072679 0.46053125 0.2538363 0.8262538 0.2665896 0.8113185 0.9235753 0.8882671 0.9095543 0.8198684 0.8454827 0.2437792 0.9244259 0.8653044 0.2102269 0.6319253
## 200223270003_R06C01 0.8796022 0.87513655 0.1864902 0.8450480 0.2574003 0.8144274 0.5895839 0.8842646 0.8905903 0.8069537 0.0871902 0.2523095 0.9263996 0.2405168 0.7649181 0.7435100
## cg27086157 cg21209485 cg20208879 cg12012426 cg01023242 cg11401796 cg02246922 cg10039445 cg26948066 cg05891136 cg12776173 cg08914944 cg14582632 cg05799088 cg11540596 cg16536985
## 200223270003_R02C01 0.9224112 0.8865053 0.66986658 0.9165048 0.7210683 0.8453050 0.7301201 0.8833873 0.4685225 0.7797403 0.1038804 0.63423942 0.8475098 0.9023317 0.9238951 0.5789643
## 200223270003_R03C01 0.9219304 0.8714878 0.02423079 0.9434768 0.9032685 0.4319176 0.9447019 0.8954055 0.5026045 0.3310206 0.8730635 0.04392811 0.5526692 0.8779381 0.8926595 0.5418687
## 200223270003_R06C01 0.3224986 0.2292550 0.61769424 0.9220044 0.7831190 0.4370329 0.7202230 0.8832807 0.9101976 0.7965298 0.7009491 0.06893322 0.5288675 0.6887230 0.8820252 0.8392044
## cg03549208 cg10666341 cg01008088 cg03600007 cg16180556 cg15985500 cg03982462 cg02550738 cg10993865 cg14192979 cg07227024 cg09708852 cg16571124 cg18857647 cg16202259 cg12421087
## 200223270003_R02C01 0.9014487 0.9046648 0.8424817 0.5658487 0.39300141 0.8555262 0.8562777 0.6201457 0.9173768 0.06336040 0.04553128 0.2843446 0.9282854 0.8582332 0.9548726 0.5647607
## 200223270003_R03C01 0.8381784 0.6731062 0.2417656 0.6018832 0.07312155 0.8312198 0.6023731 0.9011727 0.9096170 0.06019651 0.05004286 0.2897826 0.9206431 0.8394132 0.3713483 0.5399655
## 200223270003_R06C01 0.9097817 0.6443180 0.2618620 0.8611166 0.20051805 0.8492103 0.8778458 0.9085849 0.4904519 0.52114282 0.06152206 0.8896436 0.9276842 0.2647491 0.4852461 0.5400348
## cg03796003 cg10788927 cg15491125 cg06880438 cg12501287 cg24470466 cg16405337 cg00272795 cg04875706 cg11882358 cg14307563 cg06012903 cg11227702 age.now cg04718469 cg25208881
## 200223270003_R02C01 0.89227099 0.8973154 0.9066635 0.8285145 0.4654925 0.7725300 0.6177291 0.46365138 0.5790542 0.89136326 0.1855966 0.7964595 0.86486075 82.4 0.8687522 0.1851956
## 200223270003_R03C01 0.86011668 0.2021398 0.3850991 0.7988881 0.5126917 0.9041432 0.6131717 0.82839260 0.9255066 0.04943344 0.8916957 0.1933431 0.49184121 78.6 0.7256813 0.9092286
## 200223270003_R06C01 0.08518098 0.2053075 0.9091504 0.7839538 0.9189144 0.1206738 0.6098664 0.07231279 0.9155843 0.80176322 0.8750052 0.1960773 0.02543724 80.4 0.8521881 0.9265502
## cg00689685 cg12333628 cg11133939 cg01933473 cg11314779 cg24634455 cg05161773 cg02464073 cg04768387 cg24851651 cg01910713 cg14649234 cg08896901 cg03088219 cg26642936 cg01153376
## 200223270003_R02C01 0.7019389 0.9227884 0.1282694 0.2589014 0.0242134 0.7796391 0.4120912 0.4842537 0.3131047 0.03674702 0.8573169 0.05165754 0.3581911 0.844002862 0.7619266 0.4872148
## 200223270003_R03C01 0.8634268 0.9092861 0.5920898 0.6726133 0.8966100 0.5188241 0.4154907 0.4998933 0.9465814 0.05358297 0.8538850 0.79015014 0.2467071 0.007435243 0.7023413 0.9639670
## 200223270003_R06C01 0.6378795 0.5084647 0.5127706 0.2642560 0.8908661 0.5325725 0.8526849 0.9077933 0.9098563 0.05968923 0.8110366 0.65413166 0.9225209 0.120155222 0.7099380 0.2242410
## cg17061760 cg04888234 cg09785377
## 200223270003_R02C01 0.08726914 0.8379655 0.9162088
## 200223270003_R03C01 0.59377488 0.4376314 0.9226292
## 200223270003_R06C01 0.83354475 0.8039047 0.6405193
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
if(METHOD_FEATURE_FLAG == 4){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m4_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m4[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==3){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m3_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m3[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
if(METHOD_FEATURE_FLAG==1){
df_process_frequency_FeatureName <- rownames(df_feature_Output_frequency[df_feature_Output_frequency$Total_Count>=3,])
df_process_Output_freq<-processed_data_m1_df[,c("DX",df_process_frequency_FeatureName)]
output_Frequency_Feature <- processed_data_m1[,c("DX",df_process_frequency_FeatureName)]
print(head(output_Frequency_Feature))
print(paste("The number of final used features of common importance method:", length(df_process_frequency_FeatureName) ))
print(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
}
print(df_process_frequency_FeatureName)
## [1] "cg27272246" "cg14710850" "cg00004073" "cg13405878" "cg02981548" "cg20685672" "cg08788093" "cg03924089" "cg16652920" "cg24433124" "cg12543766" "cg14687298" "cg17129965" "cg06833284" "cg11169344"
## [16] "cg14168080" "cg26081710" "cg02631626" "cg17042243" "cg08861434" "cg07640670" "cg20398163" "cg03979311" "cg10978526" "cg22933800" "cg07028768" "cg04248279" "cg07504457" "cg14175932" "cg26219488"
## [31] "cg06231502" "cg06115838" "cg06483046" "cg07104639" "cg21392220" "cg18819889" "cg08779649" "cg08198851" "cg17186592" "cg06546677" "cg26705599" "cg23517115" "cg00819121" "cg11268585" "cg13815695"
## [46] "cg06286533" "cg23352245" "cg26679884" "cg18816397" "cg15633912" "cg10681981" "cg01128042" "cg02772171" "cg06394820" "cg17738613" "cg04242342" "cg22741595" "cg02887598" "cg22071943" "cg04971651"
## [61] "cg19097407" "cg15586958" "cg08857872" "cg02621446" "cg00553601" "cg00767423" "cg18285382" "cg15098922" "cg15138543" "cg08745107" "cg00696044" "cg03749159" "cg21415084" "cg05155812" "cg22112152"
## [76] "cg14293999" "cg16655091" "cg03327352" "cg06961873" "cg08880261" "cg11286989" "cg06118351" "cg25366315" "cg26853071" "cg25436480" "cg26983017" "cg26901661" "cg21139150" "cg10738049" "cg03129555"
## [91] "cg17018422" "cg04664583" "cg03660162" "cg15600437" "cg00322003" "cg11331837" "cg23066280" "cg11187460" "cg23159970" "cg14228103" "cg09584650" "cg04728936" "cg14507637" "cg23658987" "cg11019791"
## [106] "cg05392160" "cg04412904" "cg25306893" "cg21812850" "cg24139837" "cg03737947" "cg12953206" "cg12702014" "cg20300784" "cg17429539" "cg05234269" "cg02356645" "cg03723481" "cg14564293" "cg25598710"
## [121] "cg21507367" "cg04831745" "cg18526121" "cg04462915" "cg21783012" "cg16089727" "PC1" "cg02372404" "cg12228670" "cg05570109" "cg12784167" "cg10240127" "PC2" "cg23432430" "cg21243064"
## [136] "cg02225060" "cg07480955" "cg19471911" "cg00962106" "cg09015880" "cg07158503" "cg00086247" "cg06634367" "cg16715186" "cg23923019" "cg11438323" "cg08138245" "cg13739190" "cg10091792" "cg12146221"
## [151] "cg06403901" "cg20078646" "cg00084271" "cg18918831" "cg01921484" "cg23836570" "cg05130642" "cg19799454" "cg03395511" "cg09727210" "cg00154902" "cg22169467" "cg26069044" "cg17653352" "cg10738648"
## [166] "cg22666875" "cg17268094" "cg05876883" "cg16779438" "cg09289202" "cg22542451" "cg16771215" "cg04316537" "cg25879395" "cg18150287" "cg27160885" "cg08514194" "cg14623940" "cg22535849" "cg06715136"
## [181] "cg09247979" "cg15775217" "cg19377607" "cg15535896" "cg08669168" "cg01549082" "cg07634717" "cg12063064" "cg24883219" "cg13573375" "cg06960717" "cg05850457" "cg19301366" "cg06864789" "cg14240646"
## [196] "cg03635532" "cg18993517" "cg27577781" "cg25649515" "cg00939409" "cg07478795" "cg06371647" "cg04645024" "cg18949721" "cg09216282" "cg06697310" "cg21388339" "cg15501526" "cg27086157" "cg21209485"
## [211] "cg20208879" "cg12012426" "cg01023242" "cg11401796" "cg02246922" "cg10039445" "cg26948066" "cg05891136" "cg12776173" "cg08914944" "cg14582632" "cg05799088" "cg11540596" "cg16536985" "cg03549208"
## [226] "cg10666341" "cg01008088" "cg03600007" "cg16180556" "cg15985500" "cg03982462" "cg02550738" "cg10993865" "cg14192979" "cg07227024" "cg09708852" "cg16571124" "cg18857647" "cg16202259" "cg12421087"
## [241] "cg03796003" "cg10788927" "cg15491125" "cg06880438" "cg12501287" "cg24470466" "cg16405337" "cg00272795" "cg04875706" "cg11882358" "cg14307563" "cg06012903" "cg11227702" "age.now" "cg04718469"
## [256] "cg25208881" "cg00689685" "cg12333628" "cg11133939" "cg01933473" "cg11314779" "cg24634455" "cg05161773" "cg02464073" "cg04768387" "cg24851651" "cg01910713" "cg14649234" "cg08896901" "cg03088219"
## [271] "cg26642936" "cg01153376" "cg17061760" "cg04888234" "cg09785377"
Selected_Frequency_Feature_importance <-all_Output_combined_df_impAvg[all_Output_combined_df_impAvg$Total_Count>=3,]
print(Selected_Frequency_Feature_importance)
## Feature LRM XGB ENM RF SVM Total_Count Importance_LRM1 Importance_XGB Importance_ENM1 Importance_RF Importance_SVM Average_Importance
## 1 age.now 0 1 0 1 1 3 0.010439518 0.769731412 0.003919221 1.0000000 0.7142857 0.4996752
## 2 cg00004073 1 1 1 1 1 5 0.551076235 0.000000000 0.444329028 0.6879460 0.5714286 0.4509560
## 3 cg00084271 1 1 1 0 1 4 0.280814405 0.302460974 0.225486534 0.2954690 0.4285714 0.3065605
## 4 cg00086247 1 1 1 1 0 4 0.413791322 0.195400021 0.558326182 0.4872834 0.0000000 0.3309602
## 5 cg00154902 1 1 1 0 1 4 0.233568385 0.019217805 0.334486044 0.2903668 0.4285714 0.2612421
## 7 cg00272795 1 0 0 1 1 3 0.067906325 0.000000000 0.130046643 0.4481192 0.4285714 0.2149287
## 8 cg00322003 1 1 1 1 1 5 0.165354881 0.188261954 0.249693796 0.6656887 0.4285714 0.3395141
## 10 cg00553601 1 1 1 1 1 5 0.229896890 0.187170032 0.209414127 0.4884980 0.4285714 0.3087101
## 11 cg00689685 0 1 0 1 1 3 0.032835492 0.193792637 0.151642960 0.3656674 0.4285714 0.2345020
## 12 cg00696044 1 1 1 1 1 5 0.216438037 0.052576652 0.392918580 0.6363369 0.4285714 0.3453683
## 13 cg00767423 1 1 1 1 1 5 0.228632194 0.278999698 0.282621107 0.5565249 0.7142857 0.4122127
## 14 cg00819121 1 1 1 1 1 5 0.281924651 0.115460856 0.342415549 0.4831427 0.5714286 0.3588745
## 15 cg00939409 1 1 1 0 1 4 0.068160562 0.013876680 0.173863433 0.3262195 0.5714286 0.2307098
## 16 cg00962106 1 1 1 0 1 4 0.472113534 0.447191215 0.569408229 0.3343471 0.4285714 0.4503263
## 17 cg01008088 1 0 1 1 0 3 0.199314537 0.000000000 0.237547463 0.5297409 0.2857143 0.2504634
## 18 cg01023242 0 1 1 1 1 4 0.000000000 0.120179314 0.179531864 0.4221621 0.4285714 0.2300889
## 19 cg01128042 1 1 1 1 1 5 0.266497715 0.000000000 0.277557189 0.5012991 0.5714286 0.3233565
## 20 cg01153376 0 0 1 1 1 3 0.000000000 0.000000000 0.223470745 0.5200018 0.4285714 0.2344088
## 21 cg01549082 1 1 1 1 0 4 0.120386765 0.024271030 0.198689423 0.6421079 0.2857143 0.2542339
## 22 cg01910713 0 1 1 0 1 3 0.046596677 0.022604798 0.199171049 0.2791649 0.5714286 0.2237932
## 23 cg01921484 1 1 1 0 1 4 0.252427772 0.375675516 0.439906641 0.3245732 0.4285714 0.3642309
## 24 cg01933473 0 1 0 1 1 3 0.000000000 0.105547551 0.091456074 0.5801758 0.5714286 0.2697216
## 26 cg02225060 1 1 1 1 0 4 0.493191819 0.378693009 0.469986827 0.5782728 0.2857143 0.4411717
## 27 cg02246922 0 1 1 1 1 4 0.031619850 0.069852716 0.169599907 0.6861057 0.4285714 0.2771499
## 28 cg02356645 1 1 1 1 1 5 0.117737486 0.262069654 0.231593813 0.4918152 0.4285714 0.3063575
## 29 cg02372404 1 1 1 1 1 5 0.069048535 0.081867155 0.266132830 0.4534795 0.5714286 0.2883913
## 30 cg02464073 0 1 0 1 1 3 0.055050027 0.046083310 0.118399370 0.7278353 0.4285714 0.2751879
## 32 cg02550738 1 1 0 0 1 3 0.169769864 0.044173919 0.155223000 0.2793502 0.4285714 0.2154177
## 33 cg02621446 1 1 1 1 1 5 0.231889260 0.448074437 0.335879800 0.6357484 0.4285714 0.4160327
## 34 cg02631626 1 1 1 1 1 5 0.423750900 0.128585418 0.337131252 0.4899088 0.4285714 0.3615896
## 35 cg02772171 1 1 1 1 1 5 0.264483475 0.107310824 0.377477633 0.4551138 0.7142857 0.3837343
## 36 cg02887598 1 1 1 1 1 5 0.248190819 0.137089846 0.310862991 0.4102927 0.7142857 0.3641444
## 37 cg02981548 1 1 1 1 1 5 0.545935793 0.173912403 0.579509796 0.4337773 0.4285714 0.4323413
## 38 cg03088219 0 1 0 1 1 3 0.000000000 0.006847847 0.150267331 0.4450659 0.5714286 0.2347219
## 39 cg03129555 1 1 1 1 1 5 0.167803490 0.019713473 0.212699816 0.5338546 0.4285714 0.2725286
## 41 cg03327352 1 1 1 1 1 5 0.203332863 0.144291233 0.263393514 0.3778731 0.4285714 0.2834924
## 42 cg03395511 1 1 0 1 1 4 0.234081199 0.002985802 0.154826939 0.6515604 0.5714286 0.3229766
## 43 cg03549208 1 0 0 1 1 3 0.212050844 0.000000000 0.153075025 0.4601636 0.4285714 0.2507722
## 44 cg03600007 1 0 1 0 1 3 0.194149768 0.000000000 0.220418917 0.3134360 0.4285714 0.2313152
## 45 cg03635532 1 1 1 1 0 4 0.085520396 0.118047855 0.198452358 0.5428861 0.2857143 0.2461242
## 46 cg03660162 1 1 1 1 1 5 0.166660478 0.000000000 0.368471516 0.4414535 0.5714286 0.3096028
## 48 cg03723481 1 1 1 1 1 5 0.113954985 0.152610795 0.241509887 0.5783986 0.5714286 0.3315806
## 49 cg03737947 1 1 1 1 1 5 0.131998835 0.056890740 0.197471878 0.3622478 0.7142857 0.2925790
## 50 cg03749159 1 1 1 1 1 5 0.210458191 0.044933916 0.225343921 0.8358160 0.4285714 0.3490247
## 51 cg03796003 1 1 0 1 0 3 0.100770654 0.008677713 0.146221810 0.5123194 0.2857143 0.2107408
## 52 cg03924089 1 1 1 1 1 5 0.507721610 0.087464057 0.568040785 0.6756981 0.4285714 0.4534992
## 53 cg03979311 1 1 1 1 1 5 0.382684110 0.186487406 0.347111454 0.6904517 0.5714286 0.4356326
## 54 cg03982462 1 0 1 1 0 3 0.172825814 0.000000000 0.283015281 0.4285033 0.2857143 0.2340117
## 55 cg04242342 1 1 1 1 1 5 0.254073419 0.148115981 0.246109060 0.4987525 0.5714286 0.3436959
## 56 cg04248279 1 1 1 1 1 5 0.358205829 0.441270573 0.491761079 0.5912281 0.5714286 0.4907788
## 57 cg04316537 1 1 1 1 0 4 0.179217816 0.470879376 0.304453377 0.3374059 0.2857143 0.3155341
## 58 cg04412904 1 1 1 1 1 5 0.141747163 0.754870224 0.415016324 0.6881251 0.4285714 0.4856660
## 59 cg04462915 1 1 1 1 1 5 0.098734682 0.008250629 0.246377553 0.5805848 0.4285714 0.2725038
## 61 cg04645024 1 0 1 1 1 4 0.065497309 0.000000000 0.228122322 0.5790366 0.4285714 0.2602455
## 62 cg04664583 1 1 1 1 1 5 0.166766765 0.067909093 0.332835603 0.4707130 0.4285714 0.2933592
## 63 cg04718469 0 1 1 1 0 3 0.000000000 0.460027583 0.200486101 0.3453371 0.2857143 0.2583130
## 64 cg04728936 1 1 1 1 1 5 0.154928807 0.040878551 0.278561113 0.4127552 0.4285714 0.2631390
## 65 cg04768387 0 1 1 1 0 3 0.004621094 0.038891382 0.179519322 0.3400892 0.2857143 0.1697671
## 66 cg04831745 1 1 1 1 1 5 0.106863008 0.036646123 0.226553479 0.4092489 0.5714286 0.2701480
## 67 cg04875706 1 1 0 1 0 3 0.064484089 0.100486173 0.092950602 0.5862395 0.2857143 0.2259749
## 68 cg04888234 0 0 1 1 1 3 0.050262836 0.000000000 0.194417507 0.5361021 0.4285714 0.2418708
## 69 cg04971651 1 1 1 1 1 5 0.237777287 0.424926403 0.367083098 0.5925727 0.5714286 0.4387576
## 70 cg05130642 1 1 0 1 1 4 0.246803196 0.116647337 0.134279878 0.4962524 0.5714286 0.3130823
## 71 cg05155812 1 1 1 1 1 5 0.209490405 0.000000000 0.251413067 0.4887143 0.4285714 0.2756378
## 72 cg05161773 0 1 1 0 1 3 0.014505329 0.062417196 0.232468946 0.2099430 1.0000000 0.3038669
## 73 cg05234269 1 1 1 1 1 5 0.118251744 0.000000000 0.287909385 0.5674493 0.7142857 0.3375792
## 74 cg05392160 1 1 1 1 1 5 0.145994195 0.037741403 0.179360480 0.6503363 0.4285714 0.2884008
## 75 cg05570109 1 1 1 1 1 5 0.062012121 0.117249421 0.170234299 0.3897617 0.5714286 0.2621372
## 77 cg05799088 1 1 0 1 0 3 0.316920270 0.000000000 0.167020519 0.6750622 0.2857143 0.2889434
## 78 cg05850457 1 1 1 1 0 4 0.104365050 0.005153461 0.177328440 0.3749807 0.2857143 0.1895084
## 79 cg05876883 1 1 1 0 1 4 0.188442199 0.152031978 0.402981070 0.2264600 0.4285714 0.2796973
## 80 cg05891136 0 1 1 1 1 4 0.040201650 0.037698551 0.264564161 0.5766727 0.4285714 0.2695417
## 81 cg06012903 1 1 1 0 0 3 0.060085520 0.028103784 0.203864109 0.2171296 0.2857143 0.1589795
## 82 cg06115838 1 1 1 1 1 5 0.336333498 0.000000000 0.378945884 0.4487819 0.7142857 0.3756694
## 83 cg06118351 1 1 1 1 1 5 0.190221738 0.362362533 0.305861245 0.3693142 0.4285714 0.3312662
## 84 cg06231502 1 1 1 1 1 5 0.343073783 0.016798414 0.449307713 0.4925352 0.4285714 0.3460573
## [ reached 'max' / getOption("max.print") -- omitted 199 rows ]
# Output data frame with selected features based on mean method:
# "selected_impAvg_ordered_NAME", This data frame don't have column named "SampleID"
if(Flag_8mean){
filename_mean <- paste0("Selected_mean", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_mean <- paste0(OUTUT_CSV_PATHNAME, filename_mean)
if (file.exists(OUTPUTPATH_mean)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_selected_Mean,
file = OUTPUTPATH_mean,
row.names = FALSE)
}
}
if(Flag_8median){
filename_median <- paste0("Selected_median", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_median <- paste0(OUTUT_CSV_PATHNAME, filename_median)
if (file.exists(OUTPUTPATH_median)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_selected_Median,
file = OUTPUTPATH_median,
row.names = FALSE)
}
}
if(Flag_8Fequency){
filename_frequency <- paste0("Selected_frequency", "_", INPUT_NUMBER_FEATURES, "_Features.csv")
OUTPUTPATH_frequency <- paste0(OUTUT_CSV_PATHNAME, filename_frequency)
if (file.exists(OUTPUTPATH_frequency)) {
print("selected file based on frequency already exists")}
else {
write.csv(df_process_Output_freq,
file = OUTPUTPATH_frequency,
row.names = FALSE)
}
}
# This is the flag of phenotype data output,
# if set to TRUE then output the file, will check if there exist the file in the given path, if not then write the file, if there exist the file then not return.
# if set to FLASE then not output the phenotype file.
# NOTICE THAT : the phenotype file is selected from "Merged_df".
phenotypeDF<-merged_df_raw[,colnames(phenoticPart_RAW)]
print(head(phenotypeDF))
## barcodes RID.a prop.B prop.NK prop.CD4T prop.CD8T prop.Mono prop.Neutro prop.Eosino DX age.now PTGENDER ABETA TAU PTAU PC1 PC2
## 200223270003_R02C01 200223270003_R02C01 2190 0.03164651 0.03609239 0.010771839 0.01481567 0.06533409 0.8413395 0 MCI 82.40000 Male 963.2 341.5 35.48 -0.214185447 1.470293e-02
## 200223270003_R03C01 200223270003_R03C01 4080 0.03556363 0.04697771 0.002321312 0.06381941 0.04901806 0.8022999 0 CN 78.60000 Female 950.6 295.9 28.08 -0.172761185 5.745834e-02
## 200223270003_R06C01 200223270003_R06C01 4505 0.07129589 0.04412218 0.037684081 0.11457236 0.08745402 0.6448715 0 CN 80.40000 Female 1705.0 353.2 28.49 -0.003667305 8.372861e-02
## 200223270003_R07C01 200223270003_R07C01 1010 0.02081699 0.07117668 0.040966085 0.00000000 0.04459325 0.8224470 0 Dementia 78.16441 Male 493.3 272.8 22.75 -0.186779607 -1.117250e-02
## 200223270006_R01C01 200223270006_R01C01 4226 0.02680465 0.04767947 0.128514873 0.09085886 0.07419209 0.6319501 0 MCI 62.90000 Female 1705.0 253.1 22.84 0.026814649 1.650735e-05
## 200223270006_R04C01 200223270006_R04C01 1190 0.07063013 0.05250647 0.064529118 0.04309168 0.08796080 0.6812818 0 CN 80.67796 Female 1336.0 439.3 40.78 -0.037862929 1.571950e-02
## PC3 ageGroup ageGroupsq DX_num uniqueID Horvath
## 200223270003_R02C01 -0.014043316 0.6606949 0.43651772 0 1 61.50365
## 200223270003_R03C01 0.005055871 0.2806949 0.07878961 0 1 69.26678
## 200223270003_R06C01 0.029143653 0.4606949 0.21223977 0 1 96.84418
## 200223270003_R07C01 -0.032302430 0.2371357 0.05623333 1 1 61.76446
## 200223270006_R01C01 0.052947950 -1.2893051 1.66230770 0 1 59.33885
## 200223270006_R04C01 -0.008685676 0.4884909 0.23862336 0 1 70.27197
OUTPUTPATH_phenotypePart <- paste0(OUTUT_CSV_PATHNAME, "PhenotypePart_df.csv")
if(phenoOutPUt_FLAG ){
if (file.exists(OUTPUTPATH_phenotypePart)) {
print("Phenotype File already exists")}
else {
write.csv(phenotypeDF, file = OUTPUTPATH_phenotypePart, row.names = FALSE)
}
}
## [1] "Phenotype File already exists"
Performance of the selected output features based on Mean
processed_dataFrame<-df_selected_Mean
processed_data<-output_mean_process
AfterProcess_FeatureName<-selected_impAvg_ordered_NAME
print(head(output_mean_process))
## # A tibble: 6 × 251
## DX cg27272246 PC2 cg12543766 cg20685672 cg14687298 cg11331837 cg16652920 cg14168080 age.now cg04248279 cg24433124 cg04412904 cg07028768 cg14710850 cg08861434 cg06833284 cg03924089 cg20398163
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI 0.862 1.47e-2 0.510 0.671 0.0421 0.0369 0.944 0.419 82.4 0.853 0.132 0.0509 0.450 0.805 0.877 0.913 0.792 0.173
## 2 CN 0.871 5.75e-2 0.887 0.793 0.148 0.572 0.943 0.442 78.6 0.846 0.599 0.0772 0.854 0.809 0.435 0.900 0.737 0.873
## 3 CN 0.810 8.37e-2 0.0282 0.661 0.243 0.0318 0.946 0.436 80.4 0.833 0.819 0.0825 0.836 0.829 0.870 0.610 0.851 0.262
## 4 MCI 0.769 1.65e-5 0.818 0.0829 0.513 0.930 0.953 0.946 62.9 0.597 0.592 0.119 0.884 0.850 0.862 0.0381 0.869 0.267
## 5 CN 0.440 1.57e-2 0.457 0.845 0.0362 0.540 0.949 0.399 80.7 0.894 0.574 0.0889 0.451 0.821 0.906 0.915 0.748 0.682
## 6 MCI 0.750 3.46e-2 0.804 0.657 0.241 0.924 0.949 0.950 80.6 0.273 0.606 0.117 0.849 0.845 0.468 0.901 0.753 0.829
## # ℹ 232 more variables: cg00004073 <dbl>, cg00962106 <dbl>, cg10240127 <dbl>, cg06634367 <dbl>, cg02225060 <dbl>, cg04971651 <dbl>, cg09015880 <dbl>, cg19799454 <dbl>, cg03979311 <dbl>,
## # cg07640670 <dbl>, cg08198851 <dbl>, cg02981548 <dbl>, cg11169344 <dbl>, cg06961873 <dbl>, cg23432430 <dbl>, cg06483046 <dbl>, cg07480955 <dbl>, cg02621446 <dbl>, cg26081710 <dbl>,
## # cg00767423 <dbl>, cg22741595 <dbl>, cg13405878 <dbl>, cg10978526 <dbl>, cg08880261 <dbl>, cg22535849 <dbl>, cg06546677 <dbl>, cg20078646 <dbl>, cg17129965 <dbl>, cg08779649 <dbl>,
## # cg23836570 <dbl>, cg15633912 <dbl>, cg23517115 <dbl>, cg26705599 <dbl>, cg18285382 <dbl>, cg18819889 <dbl>, cg23352245 <dbl>, cg12228670 <dbl>, cg26901661 <dbl>, cg02772171 <dbl>,
## # cg06286533 <dbl>, cg07104639 <dbl>, cg17042243 <dbl>, cg06115838 <dbl>, cg15098922 <dbl>, cg07478795 <dbl>, cg08788093 <dbl>, cg12784167 <dbl>, cg26219488 <dbl>, cg22071943 <dbl>,
## # cg21415084 <dbl>, cg01921484 <dbl>, cg02887598 <dbl>, cg18526121 <dbl>, cg02631626 <dbl>, cg09289202 <dbl>, cg23066280 <dbl>, cg08857872 <dbl>, cg00819121 <dbl>, cg07504457 <dbl>,
## # cg11438323 <dbl>, cg07158503 <dbl>, cg19471911 <dbl>, cg14564293 <dbl>, cg18816397 <dbl>, cg27086157 <dbl>, PC1 <dbl>, cg03749159 <dbl>, cg21783012 <dbl>, cg09584650 <dbl>, cg21243064 <dbl>, …
print(selected_impAvg_ordered_NAME)
## [1] "cg27272246" "PC2" "cg12543766" "cg20685672" "cg14687298" "cg11331837" "cg16652920" "cg14168080" "age.now" "cg04248279" "cg24433124" "cg04412904" "cg07028768" "cg14710850" "cg08861434"
## [16] "cg06833284" "cg03924089" "cg20398163" "cg00004073" "cg00962106" "cg10240127" "cg06634367" "cg02225060" "cg04971651" "cg09015880" "cg19799454" "cg03979311" "cg07640670" "cg08198851" "cg02981548"
## [31] "cg11169344" "cg06961873" "cg23432430" "cg06483046" "cg07480955" "cg02621446" "cg26081710" "cg00767423" "cg22741595" "cg13405878" "cg10978526" "cg08880261" "cg22535849" "cg06546677" "cg20078646"
## [46] "cg17129965" "cg08779649" "cg23836570" "cg15633912" "cg23517115" "cg26705599" "cg18285382" "cg18819889" "cg23352245" "cg12228670" "cg26901661" "cg02772171" "cg06286533" "cg07104639" "cg17042243"
## [61] "cg06115838" "cg15098922" "cg07478795" "cg08788093" "cg12784167" "cg26219488" "cg22071943" "cg21415084" "cg01921484" "cg02887598" "cg18526121" "cg02631626" "cg09289202" "cg23066280" "cg08857872"
## [76] "cg00819121" "cg07504457" "cg11438323" "cg07158503" "cg19471911" "cg14564293" "cg18816397" "cg27086157" "PC1" "cg03749159" "cg21783012" "cg09584650" "cg21243064" "cg06231502" "cg00696044"
## [91] "cg14175932" "cg04242342" "cg10738049" "cg15501526" "cg21392220" "cg00322003" "cg05234269" "cg16779438" "cg14293999" "cg03723481" "cg06118351" "cg00086247" "cg15138543" "cg18918831" "cg12702014"
## [106] "cg25598710" "cg10681981" "cg01128042" "cg03395511" "cg22933800" "cg16655091" "cg17018422" "cg14228103" "cg11019791" "cg19097407" "cg23658987" "cg08138245" "cg24139837" "cg14507637" "cg04316537"
## [121] "cg12776173" "cg20300784" "cg17429539" "cg06394820" "cg21388339" "cg05130642" "cg12953206" "cg15600437" "cg25208881" "cg17738613" "cg03660162" "cg00553601" "cg11268585" "cg25366315" "cg00084271"
## [136] "cg16715186" "cg02356645" "cg26069044" "cg05161773" "cg11286989" "cg26679884" "cg21507367" "cg27160885" "cg04664583" "cg21812850" "cg03737947" "cg16771215" "cg05799088" "cg22112152" "cg05392160"
## [151] "cg17653352" "cg02372404" "cg08745107" "cg26983017" "cg25436480" "cg21209485" "cg21139150" "cg03327352" "cg23923019" "cg18150287" "cg15535896" "cg05876883" "cg23159970" "cg06880438" "cg02246922"
## [166] "cg25649515" "cg05155812" "cg17186592" "cg24851651" "cg15985500" "cg02464073" "cg08514194" "cg10738648" "cg11187460" "cg27577781" "cg10091792" "cg13815695" "cg26948066" "cg25306893" "cg03129555"
## [181] "cg04462915" "cg06697310" "cg14582632" "cg19301366" "cg10666341" "cg03221390" "cg22169467" "cg04831745" "cg06864789" "cg01933473" "cg05891136" "cg15586958" "cg26853071" "cg11227702" "cg15491125"
## [196] "cg16571124" "cg10039445" "cg09247979" "cg04728936" "cg13573375" "cg05570109" "cg12421087" "cg00154902" "cg04645024" "cg13739190" "cg20208879" "cg04718469" "cg08669168" "cg11314779" "cg25879395"
## [211] "cg06403901" "cg09727210" "cg19377607" "cg01549082" "cg06371647" "cg12012426" "cg03549208" "cg18993517" "cg22666875" "cg01008088" "cg12333628" "cg09216282" "cg12146221" "cg14192979" "cg22542451"
## [226] "cg03635532" "cg07634717" "cg10993865" "cg14307563" "cg14623940" "cg16089727" "cg26846609" "cg04888234" "cg17268094" "cg06960717" "cg26642936" "cg14649234" "cg06715136" "cg07227024" "cg15775217"
## [241] "cg11540596" "cg16536985" "cg03088219" "cg00689685" "cg01153376" "cg16180556" "cg25169289" "cg03982462" "cg24883219" "cg14240646"
print(head(df_selected_Mean))
## DX cg27272246 PC2 cg12543766 cg20685672 cg14687298 cg11331837 cg16652920 cg14168080 age.now cg04248279 cg24433124 cg04412904 cg07028768 cg14710850 cg08861434 cg06833284
## 200223270003_R02C01 MCI 0.8615873 0.01470293 0.51028134 0.6712101 0.04206702 0.03692842 0.9436000 0.4190123 82.4 0.8534976 0.1316610 0.05088595 0.4496851 0.8048592 0.8768306 0.9125144
## 200223270003_R03C01 CN 0.8705287 0.05745834 0.88741539 0.7932091 0.14813581 0.57150125 0.9431222 0.4420256 78.6 0.8458854 0.5987648 0.07717659 0.8536078 0.8090950 0.4352647 0.9003482
## 200223270003_R06C01 CN 0.8103777 0.08372861 0.02818501 0.6613646 0.24260002 0.03182862 0.9457161 0.4355521 80.4 0.8332786 0.8188082 0.08253743 0.8356936 0.8285902 0.8698813 0.6097933
## cg03924089 cg20398163 cg00004073 cg00962106 cg10240127 cg06634367 cg02225060 cg04971651 cg09015880 cg19799454 cg03979311 cg07640670 cg08198851 cg02981548 cg11169344 cg06961873
## 200223270003_R02C01 0.7920449 0.1728144 0.02928535 0.9124898 0.9250553 0.8695793 0.6828159 0.8902474 0.5101716 0.9178930 0.86644909 0.58296513 0.6578905 0.1342571 0.6720163 0.5335591
## 200223270003_R03C01 0.7370283 0.8728944 0.02787198 0.5375751 0.9403255 0.9512930 0.8265195 0.9219452 0.8402106 0.9106247 0.06199853 0.55225610 0.6578186 0.5220037 0.8215477 0.5472606
## 200223270003_R06C01 0.8506756 0.2623391 0.64576463 0.5040948 0.9056974 0.9544163 0.5209552 0.9035233 0.8472063 0.9066551 0.72615553 0.04058533 0.1272153 0.5098965 0.5941114 0.9415177
## cg23432430 cg06483046 cg07480955 cg02621446 cg26081710 cg00767423 cg22741595 cg13405878 cg10978526 cg08880261 cg22535849 cg06546677 cg20078646 cg17129965 cg08779649 cg23836570
## 200223270003_R02C01 0.9482702 0.04383925 0.3874638 0.8731313 0.8751040 0.9298253 0.6525533 0.4549662 0.5671930 0.40655904 0.8847704 0.4472216 0.06198170 0.8972140 0.44449401 0.58688450
## 200223270003_R03C01 0.9455418 0.50720277 0.3916889 0.8095534 0.9198212 0.2651854 0.1730013 0.7858042 0.9095713 0.85616966 0.8609966 0.8484609 0.89537412 0.8806673 0.45076825 0.54259383
## 200223270003_R06C01 0.9418716 0.89604910 0.4043390 0.7511582 0.8801892 0.8667808 0.1550739 0.7583938 0.8945157 0.03280808 0.8808022 0.5636023 0.08725521 0.8857237 0.04810217 0.03267304
## cg15633912 cg23517115 cg26705599 cg18285382 cg18819889 cg23352245 cg12228670 cg26901661 cg02772171 cg06286533 cg07104639 cg17042243 cg06115838 cg15098922 cg07478795 cg08788093
## 200223270003_R02C01 0.1605530 0.2151144 0.8585917 0.3202927 0.9156157 0.9377232 0.8632174 0.8951971 0.9182018 0.2734841 0.6772717 0.2502905 0.8847724 0.9286092 0.8911007 0.03911678
## 200223270003_R03C01 0.9333421 0.9131440 0.8613854 0.2930577 0.9004455 0.9375774 0.8496212 0.8754981 0.5660559 0.9354924 0.7123879 0.2933475 0.8447916 0.9027517 0.9095543 0.60934160
## 200223270003_R06C01 0.8737362 0.8328364 0.4332832 0.8923595 0.9054439 0.5932742 0.8738949 0.9021064 0.8995479 0.8696546 0.8099688 0.2725457 0.8805585 0.8525611 0.8905903 0.88380243
## cg12784167 cg26219488 cg22071943 cg21415084 cg01921484 cg02887598 cg18526121 cg02631626 cg09289202 cg23066280 cg08857872 cg00819121 cg07504457 cg11438323 cg07158503 cg19471911
## 200223270003_R02C01 0.81503498 0.9336638 0.8705217 0.8374415 0.9098550 0.04020908 0.4519781 0.6280766 0.4361103 0.07247841 0.3395280 0.9207001 0.7116230 0.4863471 0.5777146 0.6334393
## 200223270003_R03C01 0.02811410 0.9134707 0.2442648 0.8509420 0.9093137 0.67073881 0.4762313 0.1951736 0.4397504 0.57174588 0.8181845 0.9281472 0.6854539 0.8984559 0.6203543 0.8437175
## 200223270003_R06C01 0.03073269 0.9261878 0.2644581 0.8378237 0.9204487 0.73408417 0.4833367 0.2699849 0.4193555 0.80814756 0.2970779 0.9327211 0.7205633 0.8722772 0.6236025 0.6127952
## cg14564293 cg18816397 cg27086157 PC1 cg03749159 cg21783012 cg09584650 cg21243064 cg06231502 cg00696044 cg14175932 cg04242342 cg10738049 cg15501526 cg21392220 cg00322003
## 200223270003_R02C01 0.52089591 0.5472925 0.9224112 -0.214185447 0.9355921 0.9142369 0.08230254 0.5191606 0.7784451 0.55608424 0.5746953 0.8206769 0.5441211 0.6362531 0.8726204 0.1759911
## 200223270003_R03C01 0.04000662 0.4940355 0.9219304 -0.172761185 0.9153921 0.6694884 0.09661586 0.9167649 0.7964278 0.07552381 0.8779027 0.8167892 0.5232715 0.6319253 0.8563905 0.5702070
## 200223270003_R06C01 0.04959460 0.5337018 0.3224986 -0.003667305 0.9255807 0.9070112 0.52399749 0.4862205 0.7706160 0.79270858 0.7288239 0.8040357 0.4875473 0.7435100 0.8466199 0.3077122
## cg05234269 cg16779438 cg14293999 cg03723481 cg06118351 cg00086247 cg15138543 cg18918831 cg12702014 cg25598710 cg10681981 cg01128042 cg03395511 cg22933800 cg16655091 cg17018422
## 200223270003_R02C01 0.93848584 0.8826150 0.2836710 0.4347333 0.3633940 0.1761275 0.7734778 0.4891660 0.7704049 0.3105752 0.7035090 0.9113420 0.4491605 0.4830774 0.6055295 0.5262747
## 200223270003_R03C01 0.57461229 0.5466924 0.9172023 0.9007774 0.4714860 0.2045043 0.2949313 0.5333801 0.7848681 0.3088142 0.7382662 0.5328806 0.4835967 0.4142525 0.7053336 0.9029604
## 200223270003_R06C01 0.02467208 0.8629492 0.9168166 0.8947417 0.8655962 0.6901217 0.2496147 0.6406575 0.8065993 0.8538820 0.6971989 0.5222757 0.5523959 0.3956683 0.8724479 0.5100750
## cg14228103 cg11019791 cg19097407 cg23658987 cg08138245 cg24139837 cg14507637 cg04316537 cg12776173 cg20300784 cg17429539 cg06394820 cg21388339 cg05130642 cg12953206 cg15600437
## 200223270003_R02C01 0.9141064 0.8112324 0.1417931 0.79757644 0.8115760 0.07404605 0.9051258 0.8074830 0.1038804 0.86585964 0.7860900 0.8513195 0.2756268 0.8575504 0.2364836 0.4885353
## 200223270003_R03C01 0.8591302 0.7831231 0.8367297 0.07511718 0.1109940 0.04183445 0.9009460 0.8453340 0.8730635 0.86609999 0.7100923 0.8695521 0.2102269 0.8644077 0.2338141 0.4894487
## 200223270003_R06C01 0.1834348 0.4353250 0.2276425 0.10177571 0.7444698 0.05657120 0.9013686 0.4351695 0.7009491 0.03091187 0.7660838 0.4415020 0.7649181 0.3661324 0.6638030 0.8551374
## cg25208881 cg17738613 cg03660162 cg00553601 cg11268585 cg25366315 cg00084271 cg16715186 cg02356645 cg26069044 cg05161773 cg11286989 cg26679884 cg21507367 cg27160885 cg04664583
## 200223270003_R02C01 0.1851956 0.6879612 0.8691767 0.05601299 0.2521544 0.9182318 0.8103611 0.2742789 0.5105903 0.9240187 0.4120912 0.7590008 0.6793815 0.9268560 0.2231606 0.5572814
## 200223270003_R03C01 0.9092286 0.6582258 0.5160770 0.58957701 0.8535791 0.9209800 0.7877006 0.7946153 0.5833923 0.9407223 0.4154907 0.8533989 0.1848705 0.9290102 0.8263885 0.5881190
## 200223270003_R06C01 0.9265502 0.1022257 0.9026304 0.62426500 0.9121931 0.8972984 0.7706165 0.8124316 0.5701428 0.9332131 0.8526849 0.7313884 0.1701734 0.9039559 0.2121179 0.9352717
## cg21812850 cg03737947 cg16771215 cg05799088 cg22112152 cg05392160 cg17653352 cg02372404 cg08745107 cg26983017 cg25436480 cg21209485 cg21139150 cg03327352 cg23923019 cg18150287
## 200223270003_R02C01 0.7920645 0.91824910 0.88389723 0.9023317 0.8476101 0.9328933 0.9269778 0.03598249 0.02921338 0.89868232 0.8425160 0.8865053 0.01853264 0.8851712 0.8555018 0.7685695
## 200223270003_R03C01 0.7688711 0.92067153 0.07196933 0.8779381 0.8014136 0.2576881 0.9086951 0.02767285 0.78542320 0.03145466 0.4994032 0.8714878 0.43223243 0.8786878 0.3058914 0.7519166
## 200223270003_R06C01 0.7702792 0.03638091 0.09949974 0.6887230 0.7897897 0.8920726 0.9341775 0.03127855 0.02709928 0.84677625 0.3494312 0.2292550 0.43772680 0.3042310 0.8108207 0.2501173
## cg15535896 cg05876883 cg23159970 cg06880438 cg02246922 cg25649515 cg05155812 cg17186592 cg24851651 cg15985500 cg02464073 cg08514194 cg10738648 cg11187460 cg27577781 cg10091792
## 200223270003_R02C01 0.3382952 0.9039064 0.61817246 0.8285145 0.7301201 0.9279829 0.4514427 0.9230463 0.03674702 0.8555262 0.4842537 0.9128478 0.44931577 0.03672179 0.8143535 0.8670733
## 200223270003_R03C01 0.9253926 0.9223308 0.57492600 0.7988881 0.9447019 0.9235753 0.9070932 0.8593448 0.05358297 0.8312198 0.4998933 0.2613138 0.49894016 0.92516409 0.8113185 0.5864221
## 200223270003_R06C01 0.3320191 0.4697980 0.03288909 0.7839538 0.7202230 0.5895839 0.4107396 0.8467599 0.05968923 0.8492103 0.9077933 0.9202187 0.05552024 0.03109553 0.8144274 0.6087997
## cg13815695 cg26948066 cg25306893 cg03129555 cg04462915 cg06697310 cg14582632 cg19301366 cg10666341 cg03221390 cg22169467 cg04831745 cg06864789 cg01933473 cg05891136 cg15586958
## 200223270003_R02C01 0.9267057 0.4685225 0.6265392 0.6079616 0.03224861 0.8454609 0.8475098 0.8831393 0.9046648 0.5859063 0.3095010 0.61984995 0.05369415 0.2589014 0.7797403 0.9058263
## 200223270003_R03C01 0.6859729 0.5026045 0.8330282 0.5785498 0.50740695 0.8653044 0.5526692 0.8072679 0.6731062 0.9180706 0.2978585 0.71214149 0.46053125 0.6726133 0.3310206 0.8957526
## 200223270003_R06C01 0.6509046 0.9101976 0.6175380 0.9137818 0.02700644 0.2405168 0.5288675 0.8796022 0.6443180 0.6399867 0.8955853 0.06871768 0.87513655 0.2642560 0.7965298 0.9121763
## cg26853071 cg11227702 cg15491125 cg16571124 cg10039445 cg09247979 cg04728936 cg13573375 cg05570109 cg12421087 cg00154902 cg04645024 cg13739190 cg20208879 cg04718469 cg08669168
## 200223270003_R02C01 0.4233820 0.86486075 0.9066635 0.9282854 0.8833873 0.5070956 0.2172057 0.8670419 0.3466611 0.5647607 0.5137741 0.7366541 0.8510103 0.66986658 0.8687522 0.9226769
## 200223270003_R03C01 0.7451354 0.49184121 0.3850991 0.9206431 0.8954055 0.5706177 0.1925451 0.1733934 0.5866750 0.5399655 0.8540746 0.8454827 0.8358482 0.02423079 0.7256813 0.9164547
## 200223270003_R06C01 0.4228079 0.02543724 0.9091504 0.9276842 0.8832807 0.5090215 0.2379376 0.8888246 0.4046471 0.5400348 0.8188126 0.0871902 0.8419471 0.61769424 0.8521881 0.6362087
## cg11314779 cg25879395 cg06403901 cg09727210 cg19377607 cg01549082 cg06371647 cg12012426 cg03549208 cg18993517 cg22666875 cg01008088 cg12333628 cg09216282 cg12146221 cg14192979
## 200223270003_R02C01 0.0242134 0.88130864 0.92790690 0.4240111 0.05377464 0.2924138 0.8336894 0.9165048 0.9014487 0.2091538 0.8177182 0.8424817 0.9227884 0.9349248 0.2049284 0.06336040
## 200223270003_R03C01 0.8966100 0.02603438 0.04783341 0.8812928 0.90570746 0.7065693 0.8198684 0.9434768 0.8381784 0.2665896 0.8291957 0.2417656 0.9092861 0.9244259 0.1814927 0.06019651
## 200223270003_R06C01 0.8908661 0.91060615 0.05253626 0.8493743 0.06636174 0.2895440 0.8069537 0.9220044 0.9097817 0.2574003 0.3694180 0.2618620 0.5084647 0.9263996 0.8619250 0.52114282
## cg22542451 cg03635532 cg07634717 cg10993865 cg14307563 cg14623940 cg16089727 cg26846609 cg04888234 cg17268094 cg06960717 cg26642936 cg14649234 cg06715136 cg07227024 cg15775217
## 200223270003_R02C01 0.5884356 0.8416733 0.7483382 0.9173768 0.1855966 0.7623774 0.86748697 0.48860949 0.8379655 0.5774753 0.7030978 0.7619266 0.05165754 0.3400192 0.04553128 0.5707441
## 200223270003_R03C01 0.8337068 0.8262538 0.8254434 0.9096170 0.8916957 0.8732905 0.54996692 0.04878986 0.4376314 0.9003262 0.7653402 0.7023413 0.79015014 0.9259109 0.05004286 0.9168327
## 200223270003_R06C01 0.8125084 0.8450480 0.8181246 0.4904519 0.8750052 0.8661720 0.05876736 0.48026945 0.8039047 0.8789368 0.7206218 0.7099380 0.65413166 0.9079807 0.06152206 0.6042521
## cg11540596 cg16536985 cg03088219 cg00689685 cg01153376 cg16180556 cg25169289 cg03982462 cg24883219 cg14240646
## 200223270003_R02C01 0.9238951 0.5789643 0.844002862 0.7019389 0.4872148 0.39300141 0.1100884 0.8562777 0.6430473 0.5391334
## 200223270003_R03C01 0.8926595 0.5418687 0.007435243 0.8634268 0.9639670 0.07312155 0.7667174 0.6023731 0.6822115 0.2538363
## 200223270003_R06C01 0.8820252 0.8392044 0.120155222 0.6378795 0.2242410 0.20051805 0.2264993 0.8778458 0.5296903 0.1864902
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 389 251
dim(testData)
## [1] 165 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Mean_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Mean_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 54 16
## MCI 12 83
##
## Accuracy : 0.8303
## 95% CI : (0.7642, 0.8842)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.475e-10
##
## Kappa : 0.65
##
## Mcnemar's Test P-Value : 0.5708
##
## Sensitivity : 0.8182
## Specificity : 0.8384
## Pos Pred Value : 0.7714
## Neg Pred Value : 0.8737
## Prevalence : 0.4000
## Detection Rate : 0.3273
## Detection Prevalence : 0.4242
## Balanced Accuracy : 0.8283
##
## 'Positive' Class : CN
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Mean_LRM1_Accuracy <- cm_FeatEval_Mean_LRM1$overall["Accuracy"]
cm_FeatEval_Mean_LRM1_Kappa <- cm_FeatEval_Mean_LRM1$overall["Kappa"]
print(cm_FeatEval_Mean_LRM1_Accuracy)
## Accuracy
## 0.830303
print(cm_FeatEval_Mean_LRM1_Kappa)
## Kappa
## 0.65
print(model_LRM1)
## glmnet
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001780646 0.8714286 0.7286955
## 0.10 0.0017806455 0.8663004 0.7173851
## 0.10 0.0178064554 0.8663004 0.7159978
## 0.55 0.0001780646 0.8303696 0.6411043
## 0.55 0.0017806455 0.8046287 0.5844816
## 0.55 0.0178064554 0.7712288 0.5078327
## 1.00 0.0001780646 0.7763237 0.5246554
## 1.00 0.0017806455 0.7686647 0.5061579
## 1.00 0.0178064554 0.7326673 0.4308883
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001780646.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Mean_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Mean_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.809768
FeatEval_Mean_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Mean_mean_accuracy_cv_LRM1)
## [1] 0.809768
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.8858
## [1] "The auc value is:"
## Area under the curve: 0.8858
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG ==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_LRM1_AUC <- mean_auc
}
print(FeatEval_Mean_LRM1_AUC)
## Area under the curve: 0.8858
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## cg27272246 56.89
## cg00004073 55.63
## cg13405878 51.96
## cg23432430 49.29
## cg03924089 46.49
## cg06833284 45.65
## cg20685672 45.54
## cg02981548 44.81
## cg08861434 44.77
## cg14710850 44.68
## cg08788093 44.67
## cg14168080 43.55
## cg19471911 43.12
## cg14687298 42.70
## cg07480955 42.64
## cg14582632 42.00
## cg22933800 41.56
## cg02225060 40.95
## cg06634367 40.09
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 7.82161514
## 2 4.44949582
## 3 4.35100936
## 4 4.06430841
## 5 3.85510615
## 6 3.63604516
## 7 3.57029432
## 8 3.56211708
## 9 3.50477644
## 10 3.50141054
## 11 3.49493190
## 12 3.49361173
## 13 3.40605057
## 14 3.37249117
## 15 3.33955791
## 16 3.33482200
## 17 3.28535638
## 18 3.25082253
## 19 3.20315224
## 20 3.13551741
## 21 3.11103499
## 22 3.04512715
## 23 3.02120323
## 24 3.01058165
## 25 2.97027258
## 26 2.93788920
## 27 2.92524417
## 28 2.92471831
## 29 2.91907587
## 30 2.88516732
## 31 2.87088765
## 32 2.85685928
## 33 2.84747049
## 34 2.78357155
## 35 2.76420772
## 36 2.72696513
## 37 2.69650847
## 38 2.68981331
## 39 2.68098027
## 40 2.60357483
## 41 2.60178391
## 42 2.57082850
## 43 2.50941906
## 44 2.42360989
## 45 2.37917265
## 46 2.36264577
## 47 2.34755882
## 48 2.33741251
## 49 2.30605471
## 50 2.30350412
## 51 2.27562646
## 52 2.26223202
## 53 2.24826855
## 54 2.19389985
## 55 2.16290173
## 56 2.15471646
## 57 2.13376032
## 58 2.12966271
## 59 2.12730265
## 60 2.08793704
## 61 2.05769709
## 62 2.04020206
## 63 2.03241528
## 64 2.01581863
## 65 2.00798873
## 66 2.00703913
## 67 1.99080192
## 68 1.98476967
## 69 1.97424707
## 70 1.95408875
## 71 1.92717039
## 72 1.90587503
## 73 1.89872914
## 74 1.89813028
## 75 1.89474589
## 76 1.87517470
## 77 1.87476545
## 78 1.86109870
## 79 1.85312894
## 80 1.83731791
## 81 1.82261685
## 82 1.80754694
## 83 1.80000302
## 84 1.79046744
## 85 1.77203641
## 86 1.72735058
## 87 1.72429032
## 88 1.71345590
## 89 1.71230127
## 90 1.71075264
## 91 1.70649321
## 92 1.70024985
## 93 1.68257402
## 94 1.67593186
## 95 1.67431495
## 96 1.67329117
## 97 1.65629653
## 98 1.65376988
## 99 1.65237805
## 100 1.64755641
## 101 1.64695571
## 102 1.63431934
## 103 1.63043580
## 104 1.62900466
## 105 1.61451474
## 106 1.56718675
## 107 1.56661775
## 108 1.55698124
## 109 1.53697326
## 110 1.52827002
## 111 1.49684108
## 112 1.48887516
## 113 1.48565291
## 114 1.47227129
## 115 1.47192699
## 116 1.46878874
## 117 1.45602133
## 118 1.44256715
## 119 1.43296094
## 120 1.35638836
## 121 1.35291819
## 122 1.34931647
## 123 1.34736511
## 124 1.33082266
## 125 1.32511876
## 126 1.32192827
## 127 1.31702386
## 128 1.30934919
## 129 1.28238516
## 130 1.28188536
## 131 1.26341031
## 132 1.26102326
## 133 1.25470228
## 134 1.23915228
## 135 1.22225061
## 136 1.20718003
## 137 1.20572911
## 138 1.20390975
## 139 1.19913359
## 140 1.18533030
## 141 1.17274440
## 142 1.17109922
## 143 1.16171767
## 144 1.16154741
## 145 1.13014470
## 146 1.12894070
## 147 1.12260827
## 148 1.12100959
## 149 1.09992324
## 150 1.09200334
## 151 1.08695312
## 152 1.07703503
## 153 1.06841065
## 154 1.03427430
## 155 1.03135076
## 156 1.02434669
## 157 1.02405885
## 158 1.01420709
## 159 1.00727392
## 160 0.98359360
## 161 0.97373317
## 162 0.97202701
## 163 0.96604103
## 164 0.96340584
## 165 0.96228785
## 166 0.95786280
## 167 0.94504369
## 168 0.94083967
## 169 0.93200618
## 170 0.92440861
## 171 0.92296581
## 172 0.91902524
## 173 0.90537021
## 174 0.89880873
## 175 0.88565613
## 176 0.87691523
## 177 0.84535810
## 178 0.83426092
## 179 0.82533423
## 180 0.80602519
## 181 0.80081073
## 182 0.78422924
## 183 0.78023475
## 184 0.77878881
## 185 0.77059418
## 186 0.74366547
## 187 0.74332199
## 188 0.74304837
## 189 0.73761788
## 190 0.72729320
## 191 0.72428242
## 192 0.70991823
## 193 0.70320176
## 194 0.68553316
## 195 0.68348187
## 196 0.66879842
## 197 0.65086850
## 198 0.63153692
## 199 0.63045978
## 200 0.62329815
## 201 0.62324832
## 202 0.59649278
## 203 0.57948046
## 204 0.56836716
## 205 0.56415113
## 206 0.55929710
## 207 0.53307913
## 208 0.52686230
## 209 0.51718553
## 210 0.51150921
## 211 0.50443345
## 212 0.48693500
## 213 0.46994094
## 214 0.44990066
## 215 0.41992151
## 216 0.40743541
## 217 0.40119664
## 218 0.39861823
## 219 0.38414381
## 220 0.35453799
## 221 0.34297353
## 222 0.33675475
## 223 0.32448482
## 224 0.29935782
## 225 0.27862841
## 226 0.24195568
## 227 0.23154817
## 228 0.20813450
## 229 0.20254882
## 230 0.19344362
## 231 0.16243126
## 232 0.14555137
## 233 0.12565333
## 234 0.09128414
## 235 0.06701000
## 236 0.06677030
## 237 0.05496306
## 238 0.04558475
## 239 0.03189573
## 240 0.00000000
## 241 0.00000000
## 242 0.00000000
## 243 0.00000000
## 244 0.00000000
## 245 0.00000000
## 246 0.00000000
## 247 0.00000000
## 248 0.00000000
## 249 0.00000000
## 250 0.00000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CN MCI
## 221 333
prop.table(table(df_LRM1$DX))
##
## CN MCI
## 0.398917 0.601083
table(trainData$DX)
##
## CN MCI
## 155 234
prop.table(table(trainData$DX))
##
## CN MCI
## 0.3984576 0.6015424
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.506787
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.509677Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 22.643, df = 1, p-value = 1.951e-06
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 16.044, df = 1, p-value = 6.19e-05library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
# Extract the new balanced dataset
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN MCI
## 310 234
dim(balanced_data_LGR_1)
## [1] 544 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 55 17
## MCI 11 82
##
## Accuracy : 0.8303
## 95% CI : (0.7642, 0.8842)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.475e-10
##
## Kappa : 0.6517
##
## Mcnemar's Test P-Value : 0.3447
##
## Sensitivity : 0.8333
## Specificity : 0.8283
## Pos Pred Value : 0.7639
## Neg Pred Value : 0.8817
## Prevalence : 0.4000
## Detection Rate : 0.3333
## Detection Prevalence : 0.4364
## Balanced Accuracy : 0.8308
##
## 'Positive' Class : CN
##
print(model_LRM2)
## glmnet
##
## 544 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 435, 435, 435, 435, 436
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001998477 0.9448522 0.8867808
## 0.10 0.0019984771 0.9430003 0.8828748
## 0.10 0.0199847707 0.9337920 0.8633745
## 0.55 0.0001998477 0.9099049 0.8146519
## 0.55 0.0019984771 0.8951580 0.7836150
## 0.55 0.0199847707 0.8400442 0.6660552
## 1.00 0.0001998477 0.8896364 0.7719908
## 1.00 0.0019984771 0.8804791 0.7529876
## 1.00 0.0199847707 0.8032960 0.5894196
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001998477.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.8933515
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## cg27272246 60.07
## cg00004073 57.99
## cg13405878 53.41
## cg23432430 52.35
## cg14710850 50.66
## cg02981548 50.33
## cg06833284 50.28
## cg03924089 48.24
## cg14168080 47.45
## cg07480955 47.33
## cg21243064 46.57
## cg20685672 46.54
## cg14687298 46.43
## cg08788093 46.12
## cg08861434 45.06
## cg14582632 44.65
## cg00086247 43.99
## cg02225060 43.76
## cg19471911 43.67
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 7.209807931
## 2 4.331196755
## 3 4.181306405
## 4 3.850516474
## 5 3.774544613
## 6 3.652373574
## 7 3.628913670
## 8 3.625207095
## 9 3.477937140
## 10 3.421034531
## 11 3.412244873
## 12 3.357287555
## 13 3.355190287
## 14 3.347439338
## 15 3.325007670
## 16 3.249067018
## 17 3.219130344
## 18 3.171842209
## 19 3.154715081
## 20 3.148267764
## 21 3.086459793
## 22 2.946576772
## 23 2.935009359
## 24 2.917526816
## 25 2.917435814
## 26 2.902092548
## 27 2.901415532
## 28 2.875579491
## 29 2.870915125
## 30 2.837709125
## 31 2.818644989
## 32 2.809470326
## 33 2.796484639
## 34 2.782297069
## 35 2.763145365
## 36 2.708888671
## 37 2.677156359
## 38 2.632007027
## 39 2.616275988
## 40 2.601269455
## 41 2.573295918
## 42 2.462026126
## 43 2.388082836
## 44 2.375570348
## 45 2.371186622
## 46 2.361724346
## 47 2.318581903
## 48 2.298398843
## 49 2.277312570
## 50 2.273411453
## 51 2.242769372
## 52 2.163407508
## 53 2.152861049
## 54 2.144935869
## 55 2.125653371
## 56 2.110384127
## 57 2.103329133
## 58 2.095909916
## 59 2.095666261
## 60 2.072993615
## 61 2.058403070
## 62 2.055763383
## 63 2.034058057
## 64 2.022159315
## 65 2.017164724
## 66 1.992285115
## 67 1.972073743
## 68 1.964375258
## 69 1.933379558
## 70 1.932828507
## 71 1.907137515
## 72 1.900293766
## 73 1.876333502
## 74 1.873603519
## 75 1.871832017
## 76 1.845327698
## 77 1.839150179
## 78 1.829931284
## 79 1.821837086
## 80 1.793164029
## 81 1.788367229
## 82 1.787865451
## 83 1.774710187
## 84 1.753215518
## 85 1.746009724
## 86 1.744219618
## 87 1.743022596
## 88 1.739542011
## 89 1.699647114
## 90 1.686456261
## 91 1.669311600
## 92 1.658824510
## 93 1.651439466
## 94 1.649796194
## 95 1.642647648
## 96 1.638697152
## 97 1.631896405
## 98 1.630476732
## 99 1.613484164
## 100 1.586654057
## 101 1.584897212
## 102 1.582400756
## 103 1.581107071
## 104 1.561971458
## 105 1.552003226
## 106 1.540063503
## 107 1.533651265
## 108 1.526664279
## 109 1.501590759
## 110 1.501164690
## 111 1.475090595
## 112 1.471298621
## 113 1.453392111
## 114 1.438844486
## 115 1.437854226
## 116 1.428424216
## 117 1.386395673
## 118 1.382463234
## 119 1.379200994
## 120 1.375944768
## 121 1.373822175
## 122 1.367126112
## 123 1.334486107
## 124 1.310886567
## 125 1.300209750
## 126 1.299023581
## 127 1.295689734
## 128 1.259231617
## 129 1.256573980
## 130 1.256235924
## 131 1.250993103
## 132 1.250883374
## 133 1.250091509
## 134 1.249390511
## 135 1.248741051
## 136 1.240595251
## 137 1.201566569
## 138 1.194144773
## 139 1.187600619
## 140 1.180443182
## 141 1.174075771
## 142 1.169854310
## 143 1.169543778
## 144 1.154202905
## 145 1.136963762
## 146 1.134956826
## 147 1.130252333
## 148 1.130007603
## 149 1.122759683
## 150 1.107984405
## 151 1.092570356
## 152 1.089891822
## 153 1.082232411
## 154 1.065695920
## 155 1.062109410
## 156 1.058044678
## 157 1.039723402
## 158 1.033461338
## 159 1.028927997
## 160 1.023442003
## 161 0.995201290
## 162 0.990781436
## 163 0.985815094
## 164 0.968152534
## 165 0.966217978
## 166 0.960092499
## 167 0.957244892
## 168 0.924373228
## 169 0.914669080
## 170 0.913501263
## 171 0.890137100
## 172 0.868716082
## 173 0.868444220
## 174 0.837754557
## 175 0.835950279
## 176 0.821668968
## 177 0.821105826
## 178 0.820702818
## 179 0.806980321
## 180 0.805378458
## 181 0.798190709
## 182 0.796049752
## 183 0.784374197
## 184 0.776917953
## 185 0.767181171
## 186 0.765894202
## 187 0.752781769
## 188 0.748632724
## 189 0.729484275
## 190 0.727683977
## 191 0.720886860
## 192 0.714029853
## 193 0.711823356
## 194 0.698582786
## 195 0.658477211
## 196 0.645685358
## 197 0.643968055
## 198 0.633881282
## 199 0.610120147
## 200 0.609454594
## 201 0.605460250
## 202 0.597691086
## 203 0.590560399
## 204 0.545529987
## 205 0.542149192
## 206 0.470589840
## 207 0.467552781
## 208 0.458657804
## 209 0.450728179
## 210 0.450197967
## 211 0.444991562
## 212 0.441759669
## 213 0.435397116
## 214 0.421706896
## 215 0.419074872
## 216 0.411010657
## 217 0.402005299
## 218 0.387148997
## 219 0.377480848
## 220 0.354782322
## 221 0.347152922
## 222 0.337066192
## 223 0.315633816
## 224 0.312893311
## 225 0.303674103
## 226 0.287296735
## 227 0.280485441
## 228 0.237798748
## 229 0.228621991
## 230 0.183462175
## 231 0.169724921
## 232 0.169031531
## 233 0.136329042
## 234 0.134435360
## 235 0.121748978
## 236 0.113849131
## 237 0.105612302
## 238 0.067247231
## 239 0.033050362
## 240 0.021862001
## 241 0.001541712
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.889
## [1] "The auc value is:"
## Area under the curve: 0.889
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.8868465 0.76127908
## 0 0.05357895 0.8919414 0.77084980
## 0 0.10615789 0.8841825 0.75261969
## 0 0.15873684 0.8970363 0.77937492
## 0 0.21131579 0.8970363 0.77913058
## 0 0.26389474 0.8893107 0.76157989
## 0 0.31647368 0.8867466 0.75582277
## 0 0.36905263 0.8867466 0.75518692
## 0 0.42163158 0.8816184 0.74342628
## 0 0.47421053 0.8764236 0.73052148
## 0 0.52678947 0.8687313 0.71285167
## 0 0.57936842 0.8687313 0.71285167
## 0 0.63194737 0.8661672 0.70722723
## 0 0.68452632 0.8636031 0.70119862
## 0 0.73710526 0.8662005 0.70647948
## 0 0.78968421 0.8636364 0.70059229
## 0 0.84226316 0.8507160 0.66906941
## 0 0.89484211 0.8455544 0.65642963
## 0 0.94742105 0.8429903 0.64967533
## 0 1.00000000 0.8378621 0.63789631
## 1 0.00100000 0.7712288 0.51237857
## 1 0.05357895 0.6117549 0.09849449
## 1 0.10615789 0.6015318 0.00000000
## 1 0.15873684 0.6015318 0.00000000
## 1 0.21131579 0.6015318 0.00000000
## 1 0.26389474 0.6015318 0.00000000
## 1 0.31647368 0.6015318 0.00000000
## 1 0.36905263 0.6015318 0.00000000
## 1 0.42163158 0.6015318 0.00000000
## 1 0.47421053 0.6015318 0.00000000
## 1 0.52678947 0.6015318 0.00000000
## 1 0.57936842 0.6015318 0.00000000
## 1 0.63194737 0.6015318 0.00000000
## 1 0.68452632 0.6015318 0.00000000
## 1 0.73710526 0.6015318 0.00000000
## 1 0.78968421 0.6015318 0.00000000
## 1 0.84226316 0.6015318 0.00000000
## 1 0.89484211 0.6015318 0.00000000
## 1 0.94742105 0.6015318 0.00000000
## 1 1.00000000 0.6015318 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2113158.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
FeatEval_Mean_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Mean_mean_accuracy_cv_ENM1)
## [1] 0.7415659
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Mean_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.989717223650386"
print(FeatEval_Mean_ENM1_trainAccuracy)
## [1] 0.9897172
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Mean_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Mean_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 50 10
## MCI 16 89
##
## Accuracy : 0.8424
## 95% CI : (0.7777, 0.8944)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.263e-11
##
## Kappa : 0.6667
##
## Mcnemar's Test P-Value : 0.3268
##
## Sensitivity : 0.7576
## Specificity : 0.8990
## Pos Pred Value : 0.8333
## Neg Pred Value : 0.8476
## Prevalence : 0.4000
## Detection Rate : 0.3030
## Detection Prevalence : 0.3636
## Balanced Accuracy : 0.8283
##
## 'Positive' Class : CN
##
cm_FeatEval_Mean_ENM1_Accuracy<-cm_FeatEval_Mean_ENM1$overall["Accuracy"]
cm_FeatEval_Mean_ENM1_Kappa<-cm_FeatEval_Mean_ENM1$overall["Kappa"]
print(cm_FeatEval_Mean_ENM1_Accuracy)
## Accuracy
## 0.8424242
print(cm_FeatEval_Mean_ENM1_Kappa)
## Kappa
## 0.6666667
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## cg20685672 61.35
## cg23432430 59.74
## cg27272246 55.19
## cg00086247 55.02
## cg02981548 55.00
## cg16652920 54.32
## cg13405878 53.75
## cg03924089 51.43
## cg00962106 50.95
## cg09015880 50.36
## cg02225060 50.17
## cg14710850 49.98
## cg06833284 49.33
## cg07028768 48.49
## cg14687298 47.95
## cg06634367 47.57
## cg12543766 46.46
## cg17042243 45.29
## cg24433124 45.17
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 0.979179035
## 2 0.601527677
## 3 0.585789097
## 4 0.541290406
## 5 0.539617524
## 6 0.539511409
## 7 0.532822670
## 8 0.527276830
## 9 0.504548197
## 10 0.499936333
## 11 0.494121856
## 12 0.492290087
## 13 0.490389719
## 14 0.484111471
## 15 0.475820434
## 16 0.470537272
## 17 0.466845220
## 18 0.456040498
## 19 0.444601428
## 20 0.443378041
## 21 0.437953072
## 22 0.431767182
## 23 0.431246679
## 24 0.421397092
## 25 0.416495995
## 26 0.412894302
## 27 0.409976843
## 28 0.407785503
## 29 0.407534990
## 30 0.403328483
## 31 0.401469604
## 32 0.393025689
## 33 0.389034862
## 34 0.381546224
## 35 0.378415142
## 36 0.378181349
## 37 0.376447615
## 38 0.372466198
## 39 0.366781908
## 40 0.365487340
## 41 0.358698191
## 42 0.350069227
## 43 0.348848485
## 44 0.345672320
## 45 0.344227745
## 46 0.344177153
## 47 0.341649233
## 48 0.339734071
## 49 0.338194069
## 50 0.337953425
## 51 0.336552077
## 52 0.331647620
## 53 0.331588902
## 54 0.329659530
## 55 0.326136951
## 56 0.322886603
## 57 0.320703357
## 58 0.317574686
## 59 0.316296978
## 60 0.315840786
## 61 0.315509035
## 62 0.311969763
## 63 0.310916155
## 64 0.310038582
## 65 0.309530582
## 66 0.308240192
## 67 0.307118274
## 68 0.305364220
## 69 0.303720608
## 70 0.299300104
## 71 0.296235144
## 72 0.295761882
## 73 0.294089614
## 74 0.294071530
## 75 0.292871363
## 76 0.292335753
## 77 0.291749877
## 78 0.285788208
## 79 0.285604891
## 80 0.285006150
## 81 0.283029087
## 82 0.280660619
## 83 0.278598248
## 84 0.276785180
## 85 0.275903019
## 86 0.275497386
## 87 0.274873642
## 88 0.274435878
## 89 0.273414457
## 90 0.272126260
## 91 0.272117212
## 92 0.271229817
## 93 0.271212335
## 94 0.268765754
## 95 0.266261429
## 96 0.265159895
## 97 0.264227371
## 98 0.263736235
## 99 0.262174836
## 100 0.261287488
## 101 0.260797571
## 102 0.260165871
## 103 0.259508049
## 104 0.256851139
## 105 0.255749267
## 106 0.253688525
## 107 0.253425767
## 108 0.253161499
## 109 0.252368857
## 110 0.251304949
## 111 0.251228994
## 112 0.250752098
## 113 0.249691841
## 114 0.248173120
## 115 0.247090620
## 116 0.246157716
## 117 0.245460431
## 118 0.244309744
## 119 0.242074998
## 120 0.241560319
## 121 0.240646717
## 122 0.239295375
## 123 0.238193293
## 124 0.237872971
## 125 0.237094014
## 126 0.234074337
## 127 0.232804050
## 128 0.231975911
## 129 0.230640659
## 130 0.229813899
## 131 0.229483808
## 132 0.229357713
## 133 0.229325614
## 134 0.228533409
## 135 0.228108756
## 136 0.226703590
## 137 0.226639242
## 138 0.226447432
## 139 0.225944213
## 140 0.224609346
## 141 0.223853120
## 142 0.223724003
## 143 0.221939849
## 144 0.221416583
## 145 0.221123683
## 146 0.219797336
## 147 0.219499238
## 148 0.216151619
## 149 0.214644226
## 150 0.214540446
## 151 0.212799248
## 152 0.210428879
## 153 0.209902525
## 154 0.209608436
## 155 0.209571568
## 156 0.209389748
## 157 0.209082435
## 158 0.207386645
## 159 0.206312542
## 160 0.206268422
## 161 0.206033485
## 162 0.205223892
## 163 0.205073873
## 164 0.203268883
## 165 0.202480919
## 166 0.197008362
## 167 0.195956071
## 168 0.195831113
## 169 0.195085075
## 170 0.193728577
## 171 0.191782793
## 172 0.188796996
## 173 0.187027142
## 174 0.186719486
## 175 0.185139682
## 176 0.185128493
## 177 0.184904587
## 178 0.183875497
## 179 0.183513837
## 180 0.183245562
## 181 0.182870250
## 182 0.182052085
## 183 0.180631306
## 184 0.180263145
## 185 0.178996349
## 186 0.178063578
## 187 0.174229812
## 188 0.172742206
## 189 0.172645581
## 190 0.170062158
## 191 0.169760038
## 192 0.168973710
## 193 0.167131990
## 194 0.165388180
## 195 0.163934132
## 196 0.161884414
## 197 0.159623419
## 198 0.159568475
## 199 0.158524101
## 200 0.158454798
## 201 0.157912182
## 202 0.157544369
## 203 0.152216073
## 204 0.151188984
## 205 0.151187981
## 206 0.149766753
## 207 0.149738497
## 208 0.148786171
## 209 0.147519109
## 210 0.147405783
## 211 0.147132152
## 212 0.146982567
## 213 0.146907081
## 214 0.145690183
## 215 0.145425689
## 216 0.145073822
## 217 0.144205930
## 218 0.142395229
## 219 0.137140978
## 220 0.135944050
## 221 0.135851864
## 222 0.135363364
## 223 0.132200709
## 224 0.131318773
## 225 0.128978467
## 226 0.128939033
## 227 0.128276513
## 228 0.127301700
## 229 0.124331902
## 230 0.121934845
## 231 0.121317349
## 232 0.121203758
## 233 0.114535243
## 234 0.113644116
## 235 0.108934147
## 236 0.107198686
## 237 0.102129179
## 238 0.099643010
## 239 0.097772948
## 240 0.094599812
## 241 0.089281683
## 242 0.085684079
## 243 0.083100054
## 244 0.079241152
## 245 0.074907117
## 246 0.056625047
## 247 0.046952003
## 248 0.009998256
## 249 0.008799054
## 250 0.002044880
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.9123
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_ENM1_AUC <- mean_auc
}
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.6529471 0.25399993
## 0.3 1 0.6 0.50 100 0.6504163 0.26250610
## 0.3 1 0.6 0.50 150 0.6836830 0.32984722
## 0.3 1 0.6 0.75 50 0.6425574 0.22968347
## 0.3 1 0.6 0.75 100 0.6657676 0.29217577
## 0.3 1 0.6 0.75 150 0.6555112 0.27260343
## 0.3 1 0.6 1.00 50 0.6040293 0.13478262
## 0.3 1 0.6 1.00 100 0.6450549 0.23636905
## 0.3 1 0.6 1.00 150 0.6476856 0.24924395
## 0.3 1 0.8 0.50 50 0.6066267 0.15661951
## 0.3 1 0.8 0.50 100 0.6528472 0.26647887
## 0.3 1 0.8 0.50 150 0.6761572 0.32057272
## 0.3 1 0.8 0.75 50 0.6167832 0.18789423
## 0.3 1 0.8 0.75 100 0.6348984 0.22981298
## 0.3 1 0.8 0.75 150 0.6708625 0.29965010
## 0.3 1 0.8 1.00 50 0.5911755 0.09690859
## 0.3 1 0.8 1.00 100 0.6426573 0.22680457
## 0.3 1 0.8 1.00 150 0.6658009 0.29024786
## 0.3 2 0.6 0.50 50 0.6657010 0.27310441
## 0.3 2 0.6 0.50 100 0.6940060 0.33764304
## 0.3 2 0.6 0.50 150 0.6889777 0.32705355
## 0.3 2 0.6 0.75 50 0.6270729 0.20340643
## 0.3 2 0.6 0.75 100 0.6657010 0.29223665
## 0.3 2 0.6 0.75 150 0.6863470 0.33803522
## 0.3 2 0.6 1.00 50 0.6554779 0.24435449
## 0.3 2 0.6 1.00 100 0.6888778 0.32225058
## 0.3 2 0.6 1.00 150 0.6915085 0.33000793
## 0.3 2 0.8 0.50 50 0.6631702 0.26649453
## 0.3 2 0.8 0.50 100 0.7144522 0.38008758
## 0.3 2 0.8 0.50 150 0.7247752 0.40135539
## 0.3 2 0.8 0.75 50 0.6245754 0.19631186
## 0.3 2 0.8 0.75 100 0.6605062 0.27300585
## 0.3 2 0.8 0.75 150 0.6964702 0.34991940
## 0.3 2 0.8 1.00 50 0.6117549 0.15768558
## 0.3 2 0.8 1.00 100 0.6528805 0.24968386
## 0.3 2 0.8 1.00 150 0.6657010 0.28005049
## 0.3 3 0.6 0.50 50 0.6734932 0.29099155
## 0.3 3 0.6 0.50 100 0.7196470 0.39030473
## 0.3 3 0.6 0.50 150 0.7196470 0.39640782
## 0.3 3 0.6 0.75 50 0.6940060 0.33850954
## 0.3 3 0.6 0.75 100 0.7095238 0.36779122
## 0.3 3 0.6 0.75 150 0.7043623 0.35628235
## 0.3 3 0.6 1.00 50 0.6504163 0.23765025
## 0.3 3 0.6 1.00 100 0.6580753 0.25015604
## 0.3 3 0.6 1.00 150 0.6657676 0.27030775
## 0.3 3 0.8 0.50 50 0.6633034 0.27883083
## 0.3 3 0.8 0.50 100 0.6685315 0.28512362
## 0.3 3 0.8 0.50 150 0.6993007 0.35362458
## 0.3 3 0.8 0.75 50 0.6605395 0.25526201
## 0.3 3 0.8 0.75 100 0.6758908 0.28215701
## 0.3 3 0.8 0.75 150 0.6887779 0.31463807
## 0.3 3 0.8 1.00 50 0.6655345 0.27032985
## 0.3 3 0.8 1.00 100 0.6629038 0.26513881
## 0.3 3 0.8 1.00 150 0.6680653 0.27607371
## 0.4 1 0.6 0.50 50 0.6220446 0.20771855
## 0.4 1 0.6 0.50 100 0.6914086 0.34536833
## 0.4 1 0.6 0.50 150 0.7196137 0.40606267
## 0.4 1 0.6 0.75 50 0.6193806 0.18296631
## 0.4 1 0.6 0.75 100 0.6374625 0.22169027
## 0.4 1 0.6 0.75 150 0.6476856 0.25016613
## 0.4 1 0.6 1.00 50 0.5731935 0.07394206
## 0.4 1 0.6 1.00 100 0.6400266 0.22250889
## 0.4 1 0.6 1.00 150 0.6657343 0.28458883
## 0.4 1 0.8 0.50 50 0.5987013 0.15285736
## 0.4 1 0.8 0.50 100 0.6270063 0.21683625
## 0.4 1 0.8 0.50 150 0.6553114 0.27712863
## 0.4 1 0.8 0.75 50 0.6298035 0.20555089
## 0.4 1 0.8 0.75 100 0.6633034 0.28763412
## 0.4 1 0.8 0.75 150 0.6812854 0.32278937
## 0.4 1 0.8 1.00 50 0.5912421 0.11764697
## 0.4 1 0.8 1.00 100 0.6631036 0.27876271
## 0.4 1 0.8 1.00 150 0.6528472 0.25963749
## 0.4 2 0.6 0.50 50 0.6733933 0.30176778
## 0.4 2 0.6 0.50 100 0.7015651 0.36520618
## 0.4 2 0.6 0.50 150 0.7118215 0.38833926
## 0.4 2 0.6 0.75 50 0.6298368 0.21083720
## 0.4 2 0.6 0.75 100 0.6760573 0.30648155
## 0.4 2 0.6 0.75 150 0.6708958 0.30163859
## 0.4 2 0.6 1.00 50 0.6322344 0.19757195
## 0.4 2 0.6 1.00 100 0.6682651 0.27984736
## 0.4 2 0.6 1.00 150 0.6811189 0.31033367
## 0.4 2 0.8 0.50 50 0.6478854 0.25065076
## 0.4 2 0.8 0.50 100 0.6633700 0.28214533
## 0.4 2 0.8 0.50 150 0.6864469 0.33207386
## 0.4 2 0.8 0.75 50 0.6657010 0.27982886
## 0.4 2 0.8 0.75 100 0.6784882 0.30628947
## 0.4 2 0.8 0.75 150 0.6759574 0.30334484
## 0.4 2 0.8 1.00 50 0.6246420 0.20044061
## 0.4 2 0.8 1.00 100 0.6580753 0.26754342
## 0.4 2 0.8 1.00 150 0.6554779 0.26006500
## 0.4 3 0.6 0.50 50 0.6631702 0.25700116
## 0.4 3 0.6 0.50 100 0.6991675 0.34330277
## 0.4 3 0.6 0.50 150 0.6862471 0.31490858
## 0.4 3 0.6 0.75 50 0.6862471 0.31286307
## 0.4 3 0.6 0.75 100 0.6990676 0.34526929
## 0.4 3 0.6 0.75 150 0.6811189 0.31167417
## 0.4 3 0.6 1.00 50 0.6657343 0.27779411
## 0.4 3 0.6 1.00 100 0.6760573 0.30260379
## 0.4 3 0.6 1.00 150 0.6683317 0.28790631
## 0.4 3 0.8 0.50 50 0.6247419 0.19164940
## 0.4 3 0.8 0.50 100 0.6452214 0.23698687
## 0.4 3 0.8 0.50 150 0.6450882 0.23223872
## 0.4 3 0.8 0.75 50 0.6733600 0.29626207
## 0.4 3 0.8 0.75 100 0.6682318 0.28008896
## 0.4 3 0.8 0.75 150 0.6913753 0.32983582
## 0.4 3 0.8 1.00 50 0.6427239 0.22259314
## 0.4 3 0.8 1.00 100 0.6401598 0.22167916
## 0.4 3 0.8 1.00 150 0.6606061 0.26537674
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 2, eta = 0.3, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6622985
FeatEval_Mean_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Mean_mean_accuracy_cv_xgb)
## [1] 0.6622985
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Mean_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Mean_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Mean_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Mean_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 37 17
## MCI 29 82
##
## Accuracy : 0.7212
## 95% CI : (0.6462, 0.7881)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.0007839
##
## Kappa : 0.401
##
## Mcnemar's Test P-Value : 0.1048330
##
## Sensitivity : 0.5606
## Specificity : 0.8283
## Pos Pred Value : 0.6852
## Neg Pred Value : 0.7387
## Prevalence : 0.4000
## Detection Rate : 0.2242
## Detection Prevalence : 0.3273
## Balanced Accuracy : 0.6944
##
## 'Positive' Class : CN
##
cm_FeatEval_Mean_xgb_Accuracy <-cm_FeatEval_Mean_xgb$overall["Accuracy"]
cm_FeatEval_Mean_xgb_Kappa <-cm_FeatEval_Mean_xgb$overall["Kappa"]
print(cm_FeatEval_Mean_xgb_Accuracy)
## Accuracy
## 0.7212121
print(cm_FeatEval_Mean_xgb_Kappa)
## Kappa
## 0.4010417
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## age.now 100.00
## cg10240127 74.59
## cg24139837 73.26
## cg09015880 66.66
## cg11438323 65.72
## cg14710850 57.54
## cg23066280 57.29
## cg02772171 54.88
## cg05234269 54.41
## cg16089727 53.37
## cg23432430 50.81
## cg09584650 50.42
## cg02981548 50.27
## cg17268094 46.84
## cg16655091 46.62
## cg17186592 45.32
## cg20685672 43.37
## cg03749159 42.69
## cg04248279 41.57
## cg26948066 41.35
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: age.now 2.794365e-02 0.0257835891 0.017412935 2.794365e-02
## 2: cg10240127 2.084352e-02 0.0120246525 0.007462687 2.084352e-02
## 3: cg24139837 2.047170e-02 0.0205515622 0.014925373 2.047170e-02
## 4: cg09015880 1.862839e-02 0.0108045070 0.007462687 1.862839e-02
## 5: cg11438323 1.836371e-02 0.0119635043 0.012437811 1.836371e-02
## ---
## 195: cg16715186 1.380345e-04 0.0004497984 0.002487562 1.380345e-04
## 196: cg22542451 1.283000e-04 0.0004594690 0.002487562 1.283000e-04
## 197: cg06118351 1.207425e-04 0.0004857072 0.002487562 1.207425e-04
## 198: cg04831745 8.168532e-05 0.0005089801 0.002487562 8.168532e-05
## 199: cg16571124 9.768157e-06 0.0006427061 0.002487562 9.768157e-06
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7815
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_xgb_AUC <- mean_auc
}
print(FeatEval_Mean_xgb_AUC)
## Area under the curve: 0.7815
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6195138 0.05869404
## 126 0.6632368 0.19667705
## 250 0.6529138 0.17337672
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 126.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6452214
FeatEval_Mean_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Mean_mean_accuracy_cv_rf)
## [1] 0.6452214
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Mean_rf_trainAccuracy<-train_accuracy
print(FeatEval_Mean_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Mean_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Mean_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 13 7
## MCI 53 92
##
## Accuracy : 0.6364
## 95% CI : (0.558, 0.7097)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.1914
##
## Kappa : 0.1429
##
## Mcnemar's Test P-Value : 6.267e-09
##
## Sensitivity : 0.19697
## Specificity : 0.92929
## Pos Pred Value : 0.65000
## Neg Pred Value : 0.63448
## Prevalence : 0.40000
## Detection Rate : 0.07879
## Detection Prevalence : 0.12121
## Balanced Accuracy : 0.56313
##
## 'Positive' Class : CN
##
cm_FeatEval_Mean_rf_Accuracy<-cm_FeatEval_Mean_rf$overall["Accuracy"]
print(cm_FeatEval_Mean_rf_Accuracy)
## Accuracy
## 0.6363636
cm_FeatEval_Mean_rf_Kappa<-cm_FeatEval_Mean_rf$overall["Kappa"]
print(cm_FeatEval_Mean_rf_Kappa)
## Kappa
## 0.1428571
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 250)
##
## Importance
## age.now 100.00
## cg06286533 79.18
## cg03924089 68.61
## cg08779649 63.35
## cg04971651 63.09
## cg14293999 60.31
## cg00696044 59.71
## cg16655091 57.10
## cg06634367 56.32
## cg03549208 55.97
## cg09289202 55.46
## cg10978526 54.64
## cg06115838 54.22
## cg20685672 53.33
## cg22071943 52.60
## cg03129555 51.46
## cg24433124 51.27
## cg02887598 50.88
## cg10681981 50.28
## cg10240127 49.67
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
## CN MCI
## 1 4.320250958 4.320250958
## 2 3.059987324 3.059987324
## 3 2.420624518 2.420624518
## 4 2.102274679 2.102274679
## 5 2.086060257 2.086060257
## 6 1.918202791 1.918202791
## 7 1.881473861 1.881473861
## 8 1.723456105 1.723456105
## 9 1.676625315 1.676625315
## 10 1.655481436 1.655481436
## 11 1.624293374 1.624293374
## 12 1.574827786 1.574827786
## 13 1.549167570 1.549167570
## 14 1.495716195 1.495716195
## 15 1.451417302 1.451417302
## 16 1.382571491 1.382571491
## 17 1.371133027 1.371133027
## 18 1.347138430 1.347138430
## 19 1.310980301 1.310980301
## 20 1.274276652 1.274276652
## 21 1.256716916 1.256716916
## 22 1.212408868 1.212408868
## 23 1.189900457 1.189900457
## 24 1.153105154 1.153105154
## 25 1.149858133 1.149858133
## 26 1.132498975 1.132498975
## 27 1.110833208 1.110833208
## 28 1.079456143 1.079456143
## 29 1.066019923 1.066019923
## 30 1.063367888 1.063367888
## 31 1.044608321 1.044608321
## 32 1.021430424 1.021430424
## 33 1.014223767 1.014223767
## 34 0.990252189 0.990252189
## 35 0.977066950 0.977066950
## 36 0.960711873 0.960711873
## 37 0.944260734 0.944260734
## 38 0.939556668 0.939556668
## 39 0.936597262 0.936597262
## 40 0.932341376 0.932341376
## 41 0.925902195 0.925902195
## 42 0.925721940 0.925721940
## 43 0.925478357 0.925478357
## 44 0.913550205 0.913550205
## 45 0.908312428 0.908312428
## 46 0.904028000 0.904028000
## 47 0.894868907 0.894868907
## 48 0.892562681 0.892562681
## 49 0.870152360 0.870152360
## 50 0.861784966 0.861784966
## 51 0.849123135 0.849123135
## 52 0.823343743 0.823343743
## 53 0.811872268 0.811872268
## 54 0.782049835 0.782049835
## 55 0.777309705 0.777309705
## 56 0.744335342 0.744335342
## 57 0.737421702 0.737421702
## 58 0.728579417 0.728579417
## 59 0.726512624 0.726512624
## 60 0.701471868 0.701471868
## 61 0.701415346 0.701415346
## 62 0.697323322 0.697323322
## 63 0.685987849 0.685987849
## 64 0.665395600 0.665395600
## 65 0.632168670 0.632168670
## 66 0.626740415 0.626740415
## 67 0.624317540 0.624317540
## 68 0.622127731 0.622127731
## 69 0.615323042 0.615323042
## 70 0.614464346 0.614464346
## 71 0.595251648 0.595251648
## 72 0.583168280 0.583168280
## 73 0.582241832 0.582241832
## 74 0.568508003 0.568508003
## 75 0.563987731 0.563987731
## 76 0.562145337 0.562145337
## 77 0.558512195 0.558512195
## 78 0.538270851 0.538270851
## 79 0.509773378 0.509773378
## 80 0.503320163 0.503320163
## 81 0.501958452 0.501958452
## 82 0.489649504 0.489649504
## 83 0.467834550 0.467834550
## 84 0.461138100 0.461138100
## 85 0.446888358 0.446888358
## 86 0.441840785 0.441840785
## 87 0.429225748 0.429225748
## 88 0.429072572 0.429072572
## 89 0.423723893 0.423723893
## 90 0.415166338 0.415166338
## 91 0.410799864 0.410799864
## 92 0.403096312 0.403096312
## 93 0.394457713 0.394457713
## 94 0.385843583 0.385843583
## 95 0.383642134 0.383642134
## 96 0.374691689 0.374691689
## 97 0.371234667 0.371234667
## 98 0.368148052 0.368148052
## 99 0.365276448 0.365276448
## 100 0.364496152 0.364496152
## 101 0.350390454 0.350390454
## 102 0.333362702 0.333362702
## 103 0.330104161 0.330104161
## 104 0.328498611 0.328498611
## 105 0.326691942 0.326691942
## 106 0.315781914 0.315781914
## 107 0.307332178 0.307332178
## 108 0.302009474 0.302009474
## 109 0.285702368 0.285702368
## 110 0.282718376 0.282718376
## 111 0.281302558 0.281302558
## 112 0.279571883 0.279571883
## 113 0.272312820 0.272312820
## 114 0.269125427 0.269125427
## 115 0.249936453 0.249936453
## 116 0.215060961 0.215060961
## 117 0.169435097 0.169435097
## 118 0.165496825 0.165496825
## 119 0.163503222 0.163503222
## 120 0.145962801 0.145962801
## 121 0.139820827 0.139820827
## 122 0.122289712 0.122289712
## 123 0.111521733 0.111521733
## 124 0.099898168 0.099898168
## 125 0.078821534 0.078821534
## 126 0.060159285 0.060159285
## 127 0.044128369 0.044128369
## 128 0.043416875 0.043416875
## 129 0.020432742 0.020432742
## 130 0.008360201 0.008360201
## 131 0.007190138 0.007190138
## 132 -0.003849092 -0.003849092
## 133 -0.011449282 -0.011449282
## 134 -0.014911011 -0.014911011
## 135 -0.016070960 -0.016070960
## 136 -0.029035961 -0.029035961
## 137 -0.068763016 -0.068763016
## 138 -0.075696851 -0.075696851
## 139 -0.082287731 -0.082287731
## 140 -0.085714013 -0.085714013
## 141 -0.102006999 -0.102006999
## 142 -0.102464357 -0.102464357
## 143 -0.104753538 -0.104753538
## 144 -0.113545917 -0.113545917
## 145 -0.128798559 -0.128798559
## 146 -0.130275114 -0.130275114
## 147 -0.141857232 -0.141857232
## 148 -0.154337305 -0.154337305
## 149 -0.160763214 -0.160763214
## 150 -0.160988590 -0.160988590
## 151 -0.163260007 -0.163260007
## 152 -0.164170406 -0.164170406
## 153 -0.172435314 -0.172435314
## 154 -0.177628838 -0.177628838
## 155 -0.184336412 -0.184336412
## 156 -0.195350739 -0.195350739
## 157 -0.202147490 -0.202147490
## 158 -0.207318840 -0.207318840
## 159 -0.213830063 -0.213830063
## 160 -0.216964983 -0.216964983
## 161 -0.222002200 -0.222002200
## 162 -0.233953984 -0.233953984
## 163 -0.247105545 -0.247105545
## 164 -0.257106670 -0.257106670
## 165 -0.264791994 -0.264791994
## 166 -0.267535621 -0.267535621
## 167 -0.269845644 -0.269845644
## 168 -0.309117186 -0.309117186
## 169 -0.321735451 -0.321735451
## 170 -0.327416064 -0.327416064
## 171 -0.338360284 -0.338360284
## 172 -0.364281063 -0.364281063
## 173 -0.381334036 -0.381334036
## 174 -0.385240577 -0.385240577
## 175 -0.387270869 -0.387270869
## 176 -0.397055267 -0.397055267
## 177 -0.408222514 -0.408222514
## 178 -0.415714529 -0.415714529
## 179 -0.421836936 -0.421836936
## 180 -0.423363988 -0.423363988
## 181 -0.429721554 -0.429721554
## 182 -0.434015963 -0.434015963
## 183 -0.461227344 -0.461227344
## 184 -0.474232089 -0.474232089
## 185 -0.481429163 -0.481429163
## 186 -0.494337874 -0.494337874
## 187 -0.501743543 -0.501743543
## 188 -0.508138424 -0.508138424
## 189 -0.515900362 -0.515900362
## 190 -0.519404368 -0.519404368
## 191 -0.521529297 -0.521529297
## 192 -0.541857990 -0.541857990
## 193 -0.556047658 -0.556047658
## 194 -0.556718506 -0.556718506
## 195 -0.562232509 -0.562232509
## 196 -0.565245765 -0.565245765
## 197 -0.579470677 -0.579470677
## 198 -0.582795521 -0.582795521
## 199 -0.590460274 -0.590460274
## 200 -0.593611974 -0.593611974
## 201 -0.598213291 -0.598213291
## 202 -0.601306948 -0.601306948
## 203 -0.604935546 -0.604935546
## 204 -0.609137730 -0.609137730
## 205 -0.610243029 -0.610243029
## 206 -0.612265732 -0.612265732
## 207 -0.622406021 -0.622406021
## 208 -0.628688009 -0.628688009
## 209 -0.640398397 -0.640398397
## 210 -0.672132935 -0.672132935
## 211 -0.672323714 -0.672323714
## 212 -0.673663536 -0.673663536
## 213 -0.688556235 -0.688556235
## 214 -0.694112821 -0.694112821
## 215 -0.694567360 -0.694567360
## 216 -0.703128265 -0.703128265
## 217 -0.725559308 -0.725559308
## 218 -0.755599066 -0.755599066
## 219 -0.761494624 -0.761494624
## 220 -0.779084568 -0.779084568
## 221 -0.781981431 -0.781981431
## 222 -0.788994331 -0.788994331
## 223 -0.791425688 -0.791425688
## 224 -0.796711931 -0.796711931
## 225 -0.810220258 -0.810220258
## 226 -0.824310841 -0.824310841
## 227 -0.841007749 -0.841007749
## 228 -0.853412317 -0.853412317
## 229 -0.855826251 -0.855826251
## 230 -0.859685094 -0.859685094
## 231 -0.885489785 -0.885489785
## 232 -0.889384866 -0.889384866
## 233 -0.908917180 -0.908917180
## 234 -0.933528594 -0.933528594
## 235 -0.952509620 -0.952509620
## 236 -0.992687523 -0.992687523
## 237 -1.047015037 -1.047015037
## 238 -1.048229533 -1.048229533
## 239 -1.074846236 -1.074846236
## 240 -1.114041473 -1.114041473
## 241 -1.163569172 -1.163569172
## 242 -1.251689480 -1.251689480
## 243 -1.253756386 -1.253756386
## 244 -1.304478451 -1.304478451
## 245 -1.318719236 -1.318719236
## 246 -1.351663188 -1.351663188
## 247 -1.549537546 -1.549537546
## 248 -1.556166848 -1.556166848
## 249 -1.594940165 -1.594940165
## 250 -1.732221100 -1.732221100
if( METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7686
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Mean_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_rf_AUC <- mean_auc
}
print(FeatEval_Mean_rf_AUC)
## Area under the curve: 0.7686
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 311, 311, 312, 311
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8534133 0.7051279
## 0.50 0.8534133 0.7051279
## 1.00 0.8714619 0.7346674
##
## Tuning parameter 'sigma' was held constant at a value of 0.002067489
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002067489 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.002067489 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8594295
FeatEval_Mean_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Mean_mean_accuracy_cv_svm)
## [1] 0.8594295
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.997429305912596"
FeatEval_Mean_svm_trainAccuracy <- train_accuracy
print(FeatEval_Mean_svm_trainAccuracy)
## [1] 0.9974293
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Mean_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Mean_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 57 12
## MCI 9 87
##
## Accuracy : 0.8727
## 95% CI : (0.8121, 0.9195)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.212e-14
##
## Kappa : 0.7368
##
## Mcnemar's Test P-Value : 0.6625
##
## Sensitivity : 0.8636
## Specificity : 0.8788
## Pos Pred Value : 0.8261
## Neg Pred Value : 0.9062
## Prevalence : 0.4000
## Detection Rate : 0.3455
## Detection Prevalence : 0.4182
## Balanced Accuracy : 0.8712
##
## 'Positive' Class : CN
##
cm_FeatEval_Mean_svm_Accuracy <- cm_FeatEval_Mean_svm$overall["Accuracy"]
cm_FeatEval_Mean_svm_Kappa <- cm_FeatEval_Mean_svm$overall["Kappa"]
print(cm_FeatEval_Mean_svm_Accuracy)
## Accuracy
## 0.8727273
print(cm_FeatEval_Mean_svm_Kappa)
## Kappa
## 0.7368421
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 554 rows and 251 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg23517115 1.090909 1.181818 1.181818 0.04693141
## 2 cg03549208 1.063636 1.181818 1.218182 0.04693141
## 3 cg02225060 1.045455 1.136364 1.136364 0.04512635
## 4 cg17129965 1.009091 1.136364 1.172727 0.04512635
## 5 cg03660162 1.045455 1.136364 1.136364 0.04512635
## 6 cg27577781 1.090909 1.136364 1.136364 0.04512635
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (test_data_SVM1$DX MCI) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9386
## [1] "The auc vlue is:"
## Area under the curve: 0.9386
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Mean_svm_AUC <- auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Mean_svm_AUC <- mean_auc
}
print(FeatEval_Mean_svm_AUC)
## Area under the curve: 0.9386
Performance of the selected output features based on Median
processed_dataFrame<-df_selected_Median
processed_data<-output_median_feature
AfterProcess_FeatureName<-Selected_median_imp_Name
print(head(output_median_feature))
## # A tibble: 6 × 251
## DX age.now PC2 cg27272246 cg20685672 cg23432430 cg00004073 cg06833284 cg14710850 cg03924089 cg13405878 cg10240127 cg14687298 cg16652920 cg04248279 cg24433124 cg12543766 cg11331837 cg17129965
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI 82.4 1.47e-2 0.862 0.671 0.948 0.0293 0.913 0.805 0.792 0.455 0.925 0.0421 0.944 0.853 0.132 0.510 0.0369 0.897
## 2 CN 78.6 5.75e-2 0.871 0.793 0.946 0.0279 0.900 0.809 0.737 0.786 0.940 0.148 0.943 0.846 0.599 0.887 0.572 0.881
## 3 CN 80.4 8.37e-2 0.810 0.661 0.942 0.646 0.610 0.829 0.851 0.758 0.906 0.243 0.946 0.833 0.819 0.0282 0.0318 0.886
## 4 MCI 62.9 1.65e-5 0.769 0.0829 0.946 0.412 0.0381 0.850 0.869 0.448 0.926 0.513 0.953 0.597 0.592 0.818 0.930 0.874
## 5 CN 80.7 1.57e-2 0.440 0.845 0.951 0.393 0.915 0.821 0.748 0.340 0.924 0.0362 0.949 0.894 0.574 0.457 0.540 0.882
## 6 MCI 80.6 3.46e-2 0.750 0.657 0.515 0.404 0.901 0.845 0.753 0.734 0.907 0.241 0.949 0.273 0.606 0.804 0.924 0.776
## # ℹ 232 more variables: cg02225060 <dbl>, cg08857872 <dbl>, cg22741595 <dbl>, PC1 <dbl>, cg06961873 <dbl>, cg09015880 <dbl>, cg00962106 <dbl>, cg08198851 <dbl>, cg26901661 <dbl>, cg14168080 <dbl>,
## # cg02981548 <dbl>, cg11169344 <dbl>, cg26081710 <dbl>, cg02621446 <dbl>, cg04412904 <dbl>, cg06231502 <dbl>, cg07028768 <dbl>, cg07104639 <dbl>, cg08880261 <dbl>, cg18285382 <dbl>,
## # cg18819889 <dbl>, cg26219488 <dbl>, cg04971651 <dbl>, cg02631626 <dbl>, cg20078646 <dbl>, cg07480955 <dbl>, cg17042243 <dbl>, cg08861434 <dbl>, cg00086247 <dbl>, cg06634367 <dbl>,
## # cg06483046 <dbl>, cg10978526 <dbl>, cg07640670 <dbl>, cg23517115 <dbl>, cg07504457 <dbl>, cg00696044 <dbl>, cg21812850 <dbl>, cg17429539 <dbl>, cg20398163 <dbl>, cg12228670 <dbl>,
## # cg14564293 <dbl>, cg03979311 <dbl>, cg12784167 <dbl>, cg06115838 <dbl>, cg07158503 <dbl>, cg02772171 <dbl>, cg01921484 <dbl>, cg22933800 <dbl>, cg11438323 <dbl>, cg10039445 <dbl>,
## # cg18816397 <dbl>, cg03660162 <dbl>, cg21243064 <dbl>, cg23352245 <dbl>, cg06118351 <dbl>, cg25208881 <dbl>, cg14175932 <dbl>, cg16715186 <dbl>, cg10681981 <dbl>, cg08788093 <dbl>,
## # cg26679884 <dbl>, cg14293999 <dbl>, cg06546677 <dbl>, cg00819121 <dbl>, cg18526121 <dbl>, cg23066280 <dbl>, cg23923019 <dbl>, cg07478795 <dbl>, cg21139150 <dbl>, cg15633912 <dbl>, …
print(Selected_median_imp_Name)
## [1] "age.now" "PC2" "cg27272246" "cg20685672" "cg23432430" "cg00004073" "cg06833284" "cg14710850" "cg03924089" "cg13405878" "cg10240127" "cg14687298" "cg16652920" "cg04248279" "cg24433124"
## [16] "cg12543766" "cg11331837" "cg17129965" "cg02225060" "cg08857872" "cg22741595" "PC1" "cg06961873" "cg09015880" "cg00962106" "cg08198851" "cg26901661" "cg14168080" "cg02981548" "cg11169344"
## [31] "cg26081710" "cg02621446" "cg04412904" "cg06231502" "cg07028768" "cg07104639" "cg08880261" "cg18285382" "cg18819889" "cg26219488" "cg04971651" "cg02631626" "cg20078646" "cg07480955" "cg17042243"
## [46] "cg08861434" "cg00086247" "cg06634367" "cg06483046" "cg10978526" "cg07640670" "cg23517115" "cg07504457" "cg00696044" "cg21812850" "cg17429539" "cg20398163" "cg12228670" "cg14564293" "cg03979311"
## [61] "cg12784167" "cg06115838" "cg07158503" "cg02772171" "cg01921484" "cg22933800" "cg11438323" "cg10039445" "cg18816397" "cg03660162" "cg21243064" "cg23352245" "cg06118351" "cg25208881" "cg14175932"
## [76] "cg16715186" "cg10681981" "cg08788093" "cg26679884" "cg14293999" "cg06546677" "cg00819121" "cg18526121" "cg23066280" "cg23923019" "cg07478795" "cg21139150" "cg15633912" "cg08138245" "cg15098922"
## [91] "cg21392220" "cg06880438" "cg04664583" "cg14507637" "cg21388339" "cg15501526" "cg25366315" "cg10738648" "cg17738613" "cg08779649" "cg16779438" "cg10738049" "cg15600437" "cg10091792" "cg19471911"
## [106] "cg11286989" "cg02887598" "cg12146221" "cg26948066" "cg27086157" "cg26853071" "cg04316537" "cg06960717" "cg13739190" "cg06403901" "cg14582632" "cg21507367" "cg17186592" "cg07634717" "cg09216282"
## [121] "cg22112152" "cg00084271" "cg19301366" "cg00154902" "cg23836570" "cg05234269" "cg19799454" "cg26705599" "cg04718469" "cg05799088" "cg10666341" "cg12333628" "cg15985500" "cg16202259" "cg16771215"
## [136] "cg27160885" "cg12689021" "cg13815695" "cg14307563" "cg25436480" "cg03982462" "cg00767423" "cg12421087" "cg22535849" "cg11268585" "cg24139837" "cg04728936" "cg01128042" "cg06394820" "cg08669168"
## [151] "cg09727210" "cg06286533" "cg18918831" "cg20678988" "cg11019791" "cg06715136" "cg15138543" "cg11133939" "cg15775217" "cg21415084" "cg20208879" "cg22071943" "cg02372404" "cg05891136" "cg03327352"
## [166] "cg25879395" "cg02356645" "cg04540199" "cg09584650" "cg26642936" "cg21783012" "cg12702014" "cg11540596" "cg16180556" "cg22542451" "cg19097407" "cg06697310" "cg04242342" "cg05155812" "cg26983017"
## [181] "cg00322003" "cg11882358" "cg05130642" "cg04462915" "cg17653352" "cg20300784" "cg07227024" "cg03723481" "cg26069044" "cg06371647" "cg12953206" "cg01008088" "cg14623940" "cg24851651" "cg15586958"
## [196] "cg03395511" "cg21209485" "cg25598710" "cg08745107" "cg00553601" "cg04645024" "cg04831745" "cg14228103" "cg05876883" "cg00512739" "cg03749159" "cg14240646" "cg01153376" "cg03600007" "cg27577781"
## [211] "cg22169467" "cg10993865" "cg16089727" "cg16536985" "cg03129555" "cg03549208" "cg05161773" "cg19377607" "cg22666875" "cg24634455" "cg16655091" "cg06012903" "cg17061760" "cg11187460" "cg06864789"
## [226] "cg25306893" "cg01910713" "cg01549082" "cg03635532" "cg02078724" "cg09247979" "cg03737947" "cg10890644" "cg04888234" "cg12012426" "cg00689685" "cg17268094" "cg17018422" "cg00247094" "cg02495179"
## [241] "cg18949721" "cg12063064" "cg08914944" "cg23159970" "cg09289202" "cg08096656" "cg19242610" "cg01023242" "cg04768387" "cg05392160"
print(head(df_selected_Median))
## DX age.now PC2 cg27272246 cg20685672 cg23432430 cg00004073 cg06833284 cg14710850 cg03924089 cg13405878 cg10240127 cg14687298 cg16652920 cg04248279 cg24433124 cg12543766
## 200223270003_R02C01 MCI 82.4 0.01470293 0.8615873 0.6712101 0.9482702 0.02928535 0.9125144 0.8048592 0.7920449 0.4549662 0.9250553 0.04206702 0.9436000 0.8534976 0.1316610 0.51028134
## 200223270003_R03C01 CN 78.6 0.05745834 0.8705287 0.7932091 0.9455418 0.02787198 0.9003482 0.8090950 0.7370283 0.7858042 0.9403255 0.14813581 0.9431222 0.8458854 0.5987648 0.88741539
## 200223270003_R06C01 CN 80.4 0.08372861 0.8103777 0.6613646 0.9418716 0.64576463 0.6097933 0.8285902 0.8506756 0.7583938 0.9056974 0.24260002 0.9457161 0.8332786 0.8188082 0.02818501
## cg11331837 cg17129965 cg02225060 cg08857872 cg22741595 PC1 cg06961873 cg09015880 cg00962106 cg08198851 cg26901661 cg14168080 cg02981548 cg11169344 cg26081710 cg02621446
## 200223270003_R02C01 0.03692842 0.8972140 0.6828159 0.3395280 0.6525533 -0.214185447 0.5335591 0.5101716 0.9124898 0.6578905 0.8951971 0.4190123 0.1342571 0.6720163 0.8751040 0.8731313
## 200223270003_R03C01 0.57150125 0.8806673 0.8265195 0.8181845 0.1730013 -0.172761185 0.5472606 0.8402106 0.5375751 0.6578186 0.8754981 0.4420256 0.5220037 0.8215477 0.9198212 0.8095534
## 200223270003_R06C01 0.03182862 0.8857237 0.5209552 0.2970779 0.1550739 -0.003667305 0.9415177 0.8472063 0.5040948 0.1272153 0.9021064 0.4355521 0.5098965 0.5941114 0.8801892 0.7511582
## cg04412904 cg06231502 cg07028768 cg07104639 cg08880261 cg18285382 cg18819889 cg26219488 cg04971651 cg02631626 cg20078646 cg07480955 cg17042243 cg08861434 cg00086247 cg06634367
## 200223270003_R02C01 0.05088595 0.7784451 0.4496851 0.6772717 0.40655904 0.3202927 0.9156157 0.9336638 0.8902474 0.6280766 0.06198170 0.3874638 0.2502905 0.8768306 0.1761275 0.8695793
## 200223270003_R03C01 0.07717659 0.7964278 0.8536078 0.7123879 0.85616966 0.2930577 0.9004455 0.9134707 0.9219452 0.1951736 0.89537412 0.3916889 0.2933475 0.4352647 0.2045043 0.9512930
## 200223270003_R06C01 0.08253743 0.7706160 0.8356936 0.8099688 0.03280808 0.8923595 0.9054439 0.9261878 0.9035233 0.2699849 0.08725521 0.4043390 0.2725457 0.8698813 0.6901217 0.9544163
## cg06483046 cg10978526 cg07640670 cg23517115 cg07504457 cg00696044 cg21812850 cg17429539 cg20398163 cg12228670 cg14564293 cg03979311 cg12784167 cg06115838 cg07158503 cg02772171
## 200223270003_R02C01 0.04383925 0.5671930 0.58296513 0.2151144 0.7116230 0.55608424 0.7920645 0.7860900 0.1728144 0.8632174 0.52089591 0.86644909 0.81503498 0.8847724 0.5777146 0.9182018
## 200223270003_R03C01 0.50720277 0.9095713 0.55225610 0.9131440 0.6854539 0.07552381 0.7688711 0.7100923 0.8728944 0.8496212 0.04000662 0.06199853 0.02811410 0.8447916 0.6203543 0.5660559
## 200223270003_R06C01 0.89604910 0.8945157 0.04058533 0.8328364 0.7205633 0.79270858 0.7702792 0.7660838 0.2623391 0.8738949 0.04959460 0.72615553 0.03073269 0.8805585 0.6236025 0.8995479
## cg01921484 cg22933800 cg11438323 cg10039445 cg18816397 cg03660162 cg21243064 cg23352245 cg06118351 cg25208881 cg14175932 cg16715186 cg10681981 cg08788093 cg26679884 cg14293999
## 200223270003_R02C01 0.9098550 0.4830774 0.4863471 0.8833873 0.5472925 0.8691767 0.5191606 0.9377232 0.3633940 0.1851956 0.5746953 0.2742789 0.7035090 0.03911678 0.6793815 0.2836710
## 200223270003_R03C01 0.9093137 0.4142525 0.8984559 0.8954055 0.4940355 0.5160770 0.9167649 0.9375774 0.4714860 0.9092286 0.8779027 0.7946153 0.7382662 0.60934160 0.1848705 0.9172023
## 200223270003_R06C01 0.9204487 0.3956683 0.8722772 0.8832807 0.5337018 0.9026304 0.4862205 0.5932742 0.8655962 0.9265502 0.7288239 0.8124316 0.6971989 0.88380243 0.1701734 0.9168166
## cg06546677 cg00819121 cg18526121 cg23066280 cg23923019 cg07478795 cg21139150 cg15633912 cg08138245 cg15098922 cg21392220 cg06880438 cg04664583 cg14507637 cg21388339 cg15501526
## 200223270003_R02C01 0.4472216 0.9207001 0.4519781 0.07247841 0.8555018 0.8911007 0.01853264 0.1605530 0.8115760 0.9286092 0.8726204 0.8285145 0.5572814 0.9051258 0.2756268 0.6362531
## 200223270003_R03C01 0.8484609 0.9281472 0.4762313 0.57174588 0.3058914 0.9095543 0.43223243 0.9333421 0.1109940 0.9027517 0.8563905 0.7988881 0.5881190 0.9009460 0.2102269 0.6319253
## 200223270003_R06C01 0.5636023 0.9327211 0.4833367 0.80814756 0.8108207 0.8905903 0.43772680 0.8737362 0.7444698 0.8525611 0.8466199 0.7839538 0.9352717 0.9013686 0.7649181 0.7435100
## cg25366315 cg10738648 cg17738613 cg08779649 cg16779438 cg10738049 cg15600437 cg10091792 cg19471911 cg11286989 cg02887598 cg12146221 cg26948066 cg27086157 cg26853071 cg04316537
## 200223270003_R02C01 0.9182318 0.44931577 0.6879612 0.44449401 0.8826150 0.5441211 0.4885353 0.8670733 0.6334393 0.7590008 0.04020908 0.2049284 0.4685225 0.9224112 0.4233820 0.8074830
## 200223270003_R03C01 0.9209800 0.49894016 0.6582258 0.45076825 0.5466924 0.5232715 0.4894487 0.5864221 0.8437175 0.8533989 0.67073881 0.1814927 0.5026045 0.9219304 0.7451354 0.8453340
## 200223270003_R06C01 0.8972984 0.05552024 0.1022257 0.04810217 0.8629492 0.4875473 0.8551374 0.6087997 0.6127952 0.7313884 0.73408417 0.8619250 0.9101976 0.3224986 0.4228079 0.4351695
## cg06960717 cg13739190 cg06403901 cg14582632 cg21507367 cg17186592 cg07634717 cg09216282 cg22112152 cg00084271 cg19301366 cg00154902 cg23836570 cg05234269 cg19799454 cg26705599
## 200223270003_R02C01 0.7030978 0.8510103 0.92790690 0.8475098 0.9268560 0.9230463 0.7483382 0.9349248 0.8476101 0.8103611 0.8831393 0.5137741 0.58688450 0.93848584 0.9178930 0.8585917
## 200223270003_R03C01 0.7653402 0.8358482 0.04783341 0.5526692 0.9290102 0.8593448 0.8254434 0.9244259 0.8014136 0.7877006 0.8072679 0.8540746 0.54259383 0.57461229 0.9106247 0.8613854
## 200223270003_R06C01 0.7206218 0.8419471 0.05253626 0.5288675 0.9039559 0.8467599 0.8181246 0.9263996 0.7897897 0.7706165 0.8796022 0.8188126 0.03267304 0.02467208 0.9066551 0.4332832
## cg04718469 cg05799088 cg10666341 cg12333628 cg15985500 cg16202259 cg16771215 cg27160885 cg12689021 cg13815695 cg14307563 cg25436480 cg03982462 cg00767423 cg12421087 cg22535849
## 200223270003_R02C01 0.8687522 0.9023317 0.9046648 0.9227884 0.8555262 0.9548726 0.88389723 0.2231606 0.7706828 0.9267057 0.1855966 0.8425160 0.8562777 0.9298253 0.5647607 0.8847704
## 200223270003_R03C01 0.7256813 0.8779381 0.6731062 0.9092861 0.8312198 0.3713483 0.07196933 0.8263885 0.7449475 0.6859729 0.8916957 0.4994032 0.6023731 0.2651854 0.5399655 0.8609966
## 200223270003_R06C01 0.8521881 0.6887230 0.6443180 0.5084647 0.8492103 0.4852461 0.09949974 0.2121179 0.7872237 0.6509046 0.8750052 0.3494312 0.8778458 0.8667808 0.5400348 0.8808022
## cg11268585 cg24139837 cg04728936 cg01128042 cg06394820 cg08669168 cg09727210 cg06286533 cg18918831 cg20678988 cg11019791 cg06715136 cg15138543 cg11133939 cg15775217 cg21415084
## 200223270003_R02C01 0.2521544 0.07404605 0.2172057 0.9113420 0.8513195 0.9226769 0.4240111 0.2734841 0.4891660 0.8438718 0.8112324 0.3400192 0.7734778 0.1282694 0.5707441 0.8374415
## 200223270003_R03C01 0.8535791 0.04183445 0.1925451 0.5328806 0.8695521 0.9164547 0.8812928 0.9354924 0.5333801 0.8548886 0.7831231 0.9259109 0.2949313 0.5920898 0.9168327 0.8509420
## 200223270003_R06C01 0.9121931 0.05657120 0.2379376 0.5222757 0.4415020 0.6362087 0.8493743 0.8696546 0.6406575 0.7786685 0.4353250 0.9079807 0.2496147 0.5127706 0.6042521 0.8378237
## cg20208879 cg22071943 cg02372404 cg05891136 cg03327352 cg25879395 cg02356645 cg04540199 cg09584650 cg26642936 cg21783012 cg12702014 cg11540596 cg16180556 cg22542451 cg19097407
## 200223270003_R02C01 0.66986658 0.8705217 0.03598249 0.7797403 0.8851712 0.88130864 0.5105903 0.8165865 0.08230254 0.7619266 0.9142369 0.7704049 0.9238951 0.39300141 0.5884356 0.1417931
## 200223270003_R03C01 0.02423079 0.2442648 0.02767285 0.3310206 0.8786878 0.02603438 0.5833923 0.7964195 0.09661586 0.7023413 0.6694884 0.7848681 0.8926595 0.07312155 0.8337068 0.8367297
## 200223270003_R06C01 0.61769424 0.2644581 0.03127855 0.7965298 0.3042310 0.91060615 0.5701428 0.4698047 0.52399749 0.7099380 0.9070112 0.8065993 0.8820252 0.20051805 0.8125084 0.2276425
## cg06697310 cg04242342 cg05155812 cg26983017 cg00322003 cg11882358 cg05130642 cg04462915 cg17653352 cg20300784 cg07227024 cg03723481 cg26069044 cg06371647 cg12953206 cg01008088
## 200223270003_R02C01 0.8454609 0.8206769 0.4514427 0.89868232 0.1759911 0.89136326 0.8575504 0.03224861 0.9269778 0.86585964 0.04553128 0.4347333 0.9240187 0.8336894 0.2364836 0.8424817
## 200223270003_R03C01 0.8653044 0.8167892 0.9070932 0.03145466 0.5702070 0.04943344 0.8644077 0.50740695 0.9086951 0.86609999 0.05004286 0.9007774 0.9407223 0.8198684 0.2338141 0.2417656
## 200223270003_R06C01 0.2405168 0.8040357 0.4107396 0.84677625 0.3077122 0.80176322 0.3661324 0.02700644 0.9341775 0.03091187 0.06152206 0.8947417 0.9332131 0.8069537 0.6638030 0.2618620
## cg14623940 cg24851651 cg15586958 cg03395511 cg21209485 cg25598710 cg08745107 cg00553601 cg04645024 cg04831745 cg14228103 cg05876883 cg00512739 cg03749159 cg14240646 cg01153376
## 200223270003_R02C01 0.7623774 0.03674702 0.9058263 0.4491605 0.8865053 0.3105752 0.02921338 0.05601299 0.7366541 0.61984995 0.9141064 0.9039064 0.9337648 0.9355921 0.5391334 0.4872148
## 200223270003_R03C01 0.8732905 0.05358297 0.8957526 0.4835967 0.8714878 0.3088142 0.78542320 0.58957701 0.8454827 0.71214149 0.8591302 0.9223308 0.8863895 0.9153921 0.2538363 0.9639670
## 200223270003_R06C01 0.8661720 0.05968923 0.9121763 0.5523959 0.2292550 0.8538820 0.02709928 0.62426500 0.0871902 0.06871768 0.1834348 0.4697980 0.9242748 0.9255807 0.1864902 0.2242410
## cg03600007 cg27577781 cg22169467 cg10993865 cg16089727 cg16536985 cg03129555 cg03549208 cg05161773 cg19377607 cg22666875 cg24634455 cg16655091 cg06012903 cg17061760 cg11187460
## 200223270003_R02C01 0.5658487 0.8143535 0.3095010 0.9173768 0.86748697 0.5789643 0.6079616 0.9014487 0.4120912 0.05377464 0.8177182 0.7796391 0.6055295 0.7964595 0.08726914 0.03672179
## 200223270003_R03C01 0.6018832 0.8113185 0.2978585 0.9096170 0.54996692 0.5418687 0.5785498 0.8381784 0.4154907 0.90570746 0.8291957 0.5188241 0.7053336 0.1933431 0.59377488 0.92516409
## 200223270003_R06C01 0.8611166 0.8144274 0.8955853 0.4904519 0.05876736 0.8392044 0.9137818 0.9097817 0.8526849 0.06636174 0.3694180 0.5325725 0.8724479 0.1960773 0.83354475 0.03109553
## cg06864789 cg25306893 cg01910713 cg01549082 cg03635532 cg02078724 cg09247979 cg03737947 cg10890644 cg04888234 cg12012426 cg00689685 cg17268094 cg17018422 cg00247094 cg02495179
## 200223270003_R02C01 0.05369415 0.6265392 0.8573169 0.2924138 0.8416733 0.3096774 0.5070956 0.91824910 0.1402372 0.8379655 0.9165048 0.7019389 0.5774753 0.5262747 0.5399349 0.6813307
## 200223270003_R03C01 0.46053125 0.8330282 0.8538850 0.7065693 0.8262538 0.2896133 0.5706177 0.92067153 0.1348023 0.4376314 0.9434768 0.8634268 0.9003262 0.9029604 0.9315640 0.7373055
## 200223270003_R06C01 0.87513655 0.6175380 0.8110366 0.2895440 0.8450480 0.2805612 0.5090215 0.03638091 0.1407028 0.8039047 0.9220044 0.6378795 0.8789368 0.5100750 0.5177874 0.5588114
## cg18949721 cg12063064 cg08914944 cg23159970 cg09289202 cg08096656 cg19242610 cg01023242 cg04768387 cg05392160
## 200223270003_R02C01 0.2334245 0.9357515 0.63423942 0.61817246 0.4361103 0.9362594 0.5188218 0.7210683 0.3131047 0.9328933
## 200223270003_R03C01 0.2437792 0.9436901 0.04392811 0.57492600 0.4397504 0.9314878 0.9236389 0.9032685 0.9465814 0.2576881
## 200223270003_R06C01 0.2523095 0.5490657 0.06893322 0.03288909 0.4193555 0.4943033 0.8761320 0.7831190 0.9098563 0.8920726
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 389 251
dim(testData)
## [1] 165 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Median_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Median_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 50 15
## MCI 16 84
##
## Accuracy : 0.8121
## 95% CI : (0.744, 0.8686)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 4.314e-09
##
## Kappa : 0.6076
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.7576
## Specificity : 0.8485
## Pos Pred Value : 0.7692
## Neg Pred Value : 0.8400
## Prevalence : 0.4000
## Detection Rate : 0.3030
## Detection Prevalence : 0.3939
## Balanced Accuracy : 0.8030
##
## 'Positive' Class : CN
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Median_LRM1_Accuracy <- cm_FeatEval_Median_LRM1$overall["Accuracy"]
cm_FeatEval_Median_LRM1_Kappa <- cm_FeatEval_Median_LRM1$overall["Kappa"]
print(cm_FeatEval_Median_LRM1_Accuracy)
## Accuracy
## 0.8121212
print(cm_FeatEval_Median_LRM1_Kappa)
## Kappa
## 0.6075949
print(model_LRM1)
## glmnet
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001780646 0.8816850 0.7501594
## 0.10 0.0017806455 0.8713953 0.7285826
## 0.10 0.0178064554 0.8533800 0.6902215
## 0.55 0.0001780646 0.8508492 0.6849685
## 0.55 0.0017806455 0.8380286 0.6560454
## 0.55 0.0178064554 0.7711955 0.5122092
## 1.00 0.0001780646 0.8020979 0.5796643
## 1.00 0.0017806455 0.7968698 0.5665562
## 1.00 0.0178064554 0.7326340 0.4319614
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001780646.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Median_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Median_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.822015
FeatEval_Median_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Median_mean_accuracy_cv_LRM1)
## [1] 0.822015
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.8907
## [1] "The auc value is:"
## Area under the curve: 0.8907
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_LRM1_AUC <-auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_LRM1_AUC <-mean_auc
}
print(FeatEval_Median_LRM1_AUC)
## Area under the curve: 0.8907
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## cg13405878 54.62
## cg27272246 53.96
## cg00004073 53.45
## cg14687298 47.12
## cg03924089 46.66
## cg14582632 46.31
## cg06833284 45.80
## cg08788093 45.76
## cg00086247 43.40
## cg20685672 43.14
## cg23432430 42.74
## cg22933800 42.28
## cg19471911 40.52
## cg11169344 40.50
## cg07480955 40.30
## cg14710850 39.91
## cg14168080 39.84
## cg02631626 39.84
## cg16652920 39.32
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 7.9721061165
## 2 4.3540845078
## 3 4.3015865235
## 4 4.2607017884
## 5 3.7563781597
## 6 3.7195768249
## 7 3.6919915937
## 8 3.6510791197
## 9 3.6478396933
## 10 3.4598636113
## 11 3.4394391120
## 12 3.4070192452
## 13 3.3709128769
## 14 3.2305063987
## 15 3.2283621343
## 16 3.2126138086
## 17 3.1815422574
## 18 3.1759598673
## 19 3.1757907838
## 20 3.1342690271
## 21 3.0776769006
## 22 3.0628173487
## 23 3.0221337059
## 24 3.0151038436
## 25 2.9920510855
## 26 2.9835710342
## 27 2.9703010474
## 28 2.9473277223
## 29 2.9460554670
## 30 2.8944407026
## 31 2.8782344517
## 32 2.7925008258
## 33 2.7747897972
## 34 2.7160681724
## 35 2.6823906576
## 36 2.6809814197
## 37 2.6420355904
## 38 2.6221133784
## 39 2.6113937956
## 40 2.6043849360
## 41 2.5646485767
## 42 2.4506581138
## 43 2.4202885604
## 44 2.4192455708
## 45 2.3979349325
## 46 2.3938606035
## 47 2.3872043137
## 48 2.3846328773
## 49 2.3560688107
## 50 2.3524670902
## 51 2.3507215689
## 52 2.3320952107
## 53 2.2762104801
## 54 2.2754603849
## 55 2.2458628859
## 56 2.2414966949
## 57 2.1766835389
## 58 2.1407788226
## 59 2.1306699527
## 60 2.1121955409
## 61 2.0993241049
## 62 2.0948633715
## 63 2.0920274127
## 64 2.0901241430
## 65 2.0792714890
## 66 2.0649604750
## 67 2.0633767527
## 68 1.9907106193
## 69 1.9833414659
## 70 1.9797515667
## 71 1.9571988756
## 72 1.9081453735
## 73 1.9020225436
## 74 1.8853311522
## 75 1.8781730723
## 76 1.8587413614
## 77 1.8413464492
## 78 1.8294047652
## 79 1.8235473856
## 80 1.8145877689
## 81 1.8130418176
## 82 1.7953744607
## 83 1.7899596943
## 84 1.7849378646
## 85 1.7759273730
## 86 1.7688125319
## 87 1.7630683762
## 88 1.7381446924
## 89 1.7152431542
## 90 1.7151054510
## 91 1.7144856133
## 92 1.7076113110
## 93 1.6859764272
## 94 1.6842792602
## 95 1.6474442255
## 96 1.6423260161
## 97 1.6385693779
## 98 1.6209427688
## 99 1.5513291766
## 100 1.5466310013
## 101 1.5462753281
## 102 1.5351397114
## 103 1.5332495855
## 104 1.5211641927
## 105 1.5188657032
## 106 1.5050111203
## 107 1.5045685163
## 108 1.5018661266
## 109 1.4924195850
## 110 1.4855907133
## 111 1.4833947291
## 112 1.4832289244
## 113 1.4614832116
## 114 1.4578645176
## 115 1.4561724752
## 116 1.4428258405
## 117 1.4423887141
## 118 1.4226446605
## 119 1.4220269793
## 120 1.4144466791
## 121 1.3984038461
## 122 1.3901587619
## 123 1.3743180381
## 124 1.3701980248
## 125 1.3650736112
## 126 1.3582729977
## 127 1.3582558935
## 128 1.3508732459
## 129 1.3420330159
## 130 1.3213478253
## 131 1.3163807793
## 132 1.3161867225
## 133 1.3138129667
## 134 1.3099911657
## 135 1.3078355680
## 136 1.2859511979
## 137 1.2824229678
## 138 1.2796348053
## 139 1.2788540043
## 140 1.2763357577
## 141 1.2668435573
## 142 1.2542808734
## 143 1.2532639810
## 144 1.2397395675
## 145 1.2305254370
## 146 1.2143525963
## 147 1.2018115957
## 148 1.1875665217
## 149 1.1845811329
## 150 1.1709480508
## 151 1.1408517835
## 152 1.1355618227
## 153 1.1262039443
## 154 1.1237526261
## 155 1.1155628307
## 156 1.0954559303
## 157 1.0838559469
## 158 1.0802862560
## 159 1.0637038229
## 160 1.0612733402
## 161 1.0575964426
## 162 1.0335574681
## 163 1.0239475155
## 164 1.0232138975
## 165 1.0082323214
## 166 1.0063222076
## 167 1.0053848610
## 168 0.9905672740
## 169 0.9785972114
## 170 0.9660531603
## 171 0.9615784874
## 172 0.9078273077
## 173 0.9037750641
## 174 0.9036882822
## 175 0.8924425605
## 176 0.8912263358
## 177 0.8908449993
## 178 0.8873097330
## 179 0.8772684297
## 180 0.8237531715
## 181 0.7994331550
## 182 0.7977275804
## 183 0.7953597972
## 184 0.7879049807
## 185 0.7730360946
## 186 0.7709930720
## 187 0.7540761417
## 188 0.7438987456
## 189 0.7432737378
## 190 0.7383407493
## 191 0.7239026712
## 192 0.6998303730
## 193 0.6985329324
## 194 0.6957013077
## 195 0.6834580513
## 196 0.6771625278
## 197 0.6701838444
## 198 0.6675493736
## 199 0.6377074730
## 200 0.5832960885
## 201 0.5722926104
## 202 0.5595250219
## 203 0.5480811505
## 204 0.5439772862
## 205 0.5222956818
## 206 0.5194119525
## 207 0.5035395522
## 208 0.4967512551
## 209 0.4959566538
## 210 0.4901995257
## 211 0.4786818223
## 212 0.4756453393
## 213 0.4619063110
## 214 0.4608487997
## 215 0.4458608627
## 216 0.3670526901
## 217 0.3492744277
## 218 0.3391842380
## 219 0.3321843718
## 220 0.3080126253
## 221 0.3063212926
## 222 0.2974788729
## 223 0.2973583369
## 224 0.2865637369
## 225 0.2775684899
## 226 0.2562886866
## 227 0.2555624047
## 228 0.1796069252
## 229 0.1794652591
## 230 0.1708604899
## 231 0.1395458036
## 232 0.1300241458
## 233 0.1181637123
## 234 0.0923566404
## 235 0.0684982123
## 236 0.0591542736
## 237 0.0268458438
## 238 0.0180154761
## 239 0.0122048209
## 240 0.0088667879
## 241 0.0027528058
## 242 0.0007227066
## 243 0.0006134142
## 244 0.0000000000
## 245 0.0000000000
## 246 0.0000000000
## 247 0.0000000000
## 248 0.0000000000
## 249 0.0000000000
## 250 0.0000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CN MCI
## 221 333
prop.table(table(df_LRM1$DX))
##
## CN MCI
## 0.398917 0.601083
table(trainData$DX)
##
## CN MCI
## 155 234
prop.table(table(trainData$DX))
##
## CN MCI
## 0.3984576 0.6015424
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.506787
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.509677Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 22.643, df = 1, p-value = 1.951e-06
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 16.044, df = 1, p-value = 6.19e-05library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN MCI
## 310 234
dim(balanced_data_LGR_1)
## [1] 544 251
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 53 17
## MCI 13 82
##
## Accuracy : 0.8182
## 95% CI : (0.7507, 0.8738)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.457e-09
##
## Kappa : 0.625
##
## Mcnemar's Test P-Value : 0.5839
##
## Sensitivity : 0.8030
## Specificity : 0.8283
## Pos Pred Value : 0.7571
## Neg Pred Value : 0.8632
## Prevalence : 0.4000
## Detection Rate : 0.3212
## Detection Prevalence : 0.4242
## Balanced Accuracy : 0.8157
##
## 'Positive' Class : CN
##
print(model_LRM2)
## glmnet
##
## 544 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 435, 435, 435, 435, 436
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0002095926 0.9393476 0.8753074
## 0.10 0.0020959263 0.9356609 0.8675750
## 0.10 0.0209592633 0.9301563 0.8563053
## 0.55 0.0002095926 0.9099049 0.8139291
## 0.55 0.0020959263 0.9044173 0.8030570
## 0.55 0.0209592633 0.8547910 0.7002776
## 1.00 0.0002095926 0.8860686 0.7644335
## 1.00 0.0020959263 0.8750255 0.7420826
## 1.00 0.0209592633 0.8143901 0.6153065
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0002095926.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.894418
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## cg00004073 50.91
## cg27272246 49.66
## cg13405878 49.42
## cg14687298 46.88
## cg06833284 46.32
## cg08788093 45.04
## cg03924089 44.88
## cg14582632 44.71
## cg00086247 42.62
## cg20685672 42.54
## cg23432430 41.72
## cg11169344 40.60
## cg22933800 39.80
## cg07480955 39.76
## cg14168080 39.51
## cg16652920 39.07
## cg19471911 38.93
## cg14710850 38.69
## cg21243064 38.21
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4|| METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 7.993368746
## 2 4.069725633
## 3 3.969428196
## 4 3.950416942
## 5 3.747105760
## 6 3.702648921
## 7 3.600514422
## 8 3.587111790
## 9 3.573623387
## 10 3.407128761
## 11 3.400013891
## 12 3.334619914
## 13 3.244984200
## 14 3.181487102
## 15 3.178113080
## 16 3.157888721
## 17 3.123357885
## 18 3.111501753
## 19 3.092321383
## 20 3.053987840
## 21 3.021664096
## 22 3.012077920
## 23 2.957256405
## 24 2.920023397
## 25 2.892179405
## 26 2.889479607
## 27 2.872959107
## 28 2.838387026
## 29 2.805774608
## 30 2.798521577
## 31 2.787207425
## 32 2.749531507
## 33 2.740058523
## 34 2.728550822
## 35 2.693842937
## 36 2.598330259
## 37 2.597235218
## 38 2.585053640
## 39 2.560694854
## 40 2.549481587
## 41 2.548532779
## 42 2.520748384
## 43 2.475444032
## 44 2.437442399
## 45 2.401613569
## 46 2.385527184
## 47 2.371876400
## 48 2.370164809
## 49 2.303760062
## 50 2.270794345
## 51 2.240564056
## 52 2.217085912
## 53 2.194113278
## 54 2.184860159
## 55 2.169825880
## 56 2.163152183
## 57 2.155161469
## 58 2.151353773
## 59 2.143601860
## 60 2.085410916
## 61 2.035178222
## 62 2.030884797
## 63 2.030157213
## 64 2.026271580
## 65 2.010564182
## 66 1.972988764
## 67 1.968980536
## 68 1.961913884
## 69 1.954428398
## 70 1.915416233
## 71 1.895365101
## 72 1.892693620
## 73 1.849487318
## 74 1.829199344
## 75 1.820794977
## 76 1.819675476
## 77 1.810626058
## 78 1.796609586
## 79 1.780970649
## 80 1.778961318
## 81 1.746115348
## 82 1.727388987
## 83 1.722606622
## 84 1.721107986
## 85 1.700756475
## 86 1.693880280
## 87 1.668776065
## 88 1.665168723
## 89 1.664868250
## 90 1.644953949
## 91 1.634110444
## 92 1.624094456
## 93 1.609180313
## 94 1.605003534
## 95 1.600039078
## 96 1.593811648
## 97 1.580268666
## 98 1.577386468
## 99 1.559602412
## 100 1.556820686
## 101 1.547307633
## 102 1.542040309
## 103 1.533346190
## 104 1.530425161
## 105 1.520691153
## 106 1.517669090
## 107 1.494556519
## 108 1.469561820
## 109 1.462409193
## 110 1.438722306
## 111 1.433819224
## 112 1.404692852
## 113 1.400694293
## 114 1.399322991
## 115 1.387063918
## 116 1.383682312
## 117 1.381199312
## 118 1.375177527
## 119 1.368746434
## 120 1.350754608
## 121 1.348275924
## 122 1.343657994
## 123 1.340086642
## 124 1.338323519
## 125 1.334053285
## 126 1.333074721
## 127 1.331881136
## 128 1.323850066
## 129 1.318856193
## 130 1.308877841
## 131 1.308600236
## 132 1.298562279
## 133 1.297725595
## 134 1.262757989
## 135 1.258666965
## 136 1.256949270
## 137 1.250568786
## 138 1.250399634
## 139 1.245258064
## 140 1.235809073
## 141 1.231432791
## 142 1.221080797
## 143 1.221042548
## 144 1.216885142
## 145 1.207984918
## 146 1.195985221
## 147 1.195042813
## 148 1.187889212
## 149 1.177050297
## 150 1.161908547
## 151 1.157516076
## 152 1.140355811
## 153 1.121111781
## 154 1.108711832
## 155 1.101959256
## 156 1.100790771
## 157 1.096645775
## 158 1.093990620
## 159 1.084560244
## 160 1.079353531
## 161 1.061706408
## 162 1.057918952
## 163 1.023304836
## 164 1.010421380
## 165 1.007208143
## 166 1.005286178
## 167 0.987354846
## 168 0.976049321
## 169 0.947732188
## 170 0.925810442
## 171 0.920190554
## 172 0.864723480
## 173 0.844825542
## 174 0.844399839
## 175 0.844260879
## 176 0.842970599
## 177 0.809500129
## 178 0.798673649
## 179 0.794672054
## 180 0.769779671
## 181 0.769099652
## 182 0.753976874
## 183 0.741729325
## 184 0.723891206
## 185 0.711185674
## 186 0.709682025
## 187 0.707405603
## 188 0.702870926
## 189 0.701400162
## 190 0.691539926
## 191 0.688961745
## 192 0.687438090
## 193 0.681095193
## 194 0.663573067
## 195 0.649696592
## 196 0.640033052
## 197 0.635328191
## 198 0.633303679
## 199 0.625924708
## 200 0.623321693
## 201 0.610533018
## 202 0.603090341
## 203 0.564145100
## 204 0.538458000
## 205 0.524381304
## 206 0.517402258
## 207 0.517303109
## 208 0.491540547
## 209 0.484730488
## 210 0.484558050
## 211 0.477693484
## 212 0.461203164
## 213 0.451783583
## 214 0.423924907
## 215 0.392025178
## 216 0.321775185
## 217 0.312297412
## 218 0.308224815
## 219 0.292895449
## 220 0.292526263
## 221 0.283115924
## 222 0.272037515
## 223 0.264244396
## 224 0.261385219
## 225 0.248761624
## 226 0.247577788
## 227 0.242035671
## 228 0.231525490
## 229 0.203499540
## 230 0.193445553
## 231 0.192500455
## 232 0.135004583
## 233 0.082960376
## 234 0.074047815
## 235 0.064927440
## 236 0.050802888
## 237 0.050063675
## 238 0.012238499
## 239 0.008127832
## 240 0.002514641
## 241 0.000000000
## 242 0.000000000
## 243 0.000000000
## 244 0.000000000
## 245 0.000000000
## 246 0.000000000
## 247 0.000000000
## 248 0.000000000
## 249 0.000000000
## 250 0.000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.8904
## [1] "The auc value is:"
## Area under the curve: 0.8904
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.8842158 0.7559188
## 0 0.05357895 0.8841492 0.7539232
## 0 0.10615789 0.8918415 0.7685486
## 0 0.15873684 0.8970030 0.7785470
## 0 0.21131579 0.8918415 0.7668761
## 0 0.26389474 0.8892774 0.7606943
## 0 0.31647368 0.8841159 0.7487658
## 0 0.36905263 0.8815518 0.7430737
## 0 0.42163158 0.8764236 0.7313730
## 0 0.47421053 0.8764236 0.7313730
## 0 0.52678947 0.8764236 0.7313730
## 0 0.57936842 0.8789877 0.7366207
## 0 0.63194737 0.8764236 0.7308636
## 0 0.68452632 0.8764569 0.7304544
## 0 0.73710526 0.8790210 0.7350622
## 0 0.78968421 0.8713287 0.7173241
## 0 0.84226316 0.8610390 0.6934685
## 0 0.89484211 0.8507493 0.6689230
## 0 0.94742105 0.8507493 0.6689230
## 0 1.00000000 0.8481518 0.6625283
## 1 0.00100000 0.7943057 0.5616570
## 1 0.05357895 0.6143523 0.1077704
## 1 0.10615789 0.6015318 0.0000000
## 1 0.15873684 0.6015318 0.0000000
## 1 0.21131579 0.6015318 0.0000000
## 1 0.26389474 0.6015318 0.0000000
## 1 0.31647368 0.6015318 0.0000000
## 1 0.36905263 0.6015318 0.0000000
## 1 0.42163158 0.6015318 0.0000000
## 1 0.47421053 0.6015318 0.0000000
## 1 0.52678947 0.6015318 0.0000000
## 1 0.57936842 0.6015318 0.0000000
## 1 0.63194737 0.6015318 0.0000000
## 1 0.68452632 0.6015318 0.0000000
## 1 0.73710526 0.6015318 0.0000000
## 1 0.78968421 0.6015318 0.0000000
## 1 0.84226316 0.6015318 0.0000000
## 1 0.89484211 0.6015318 0.0000000
## 1 0.94742105 0.6015318 0.0000000
## 1 1.00000000 0.6015318 0.0000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.1587368.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.7440601
FeatEval_Median_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Median_mean_accuracy_cv_ENM1)
## [1] 0.7440601
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Median_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.992287917737789"
print(FeatEval_Median_ENM1_trainAccuracy)
## [1] 0.9922879
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Median_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Median_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 50 10
## MCI 16 89
##
## Accuracy : 0.8424
## 95% CI : (0.7777, 0.8944)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.263e-11
##
## Kappa : 0.6667
##
## Mcnemar's Test P-Value : 0.3268
##
## Sensitivity : 0.7576
## Specificity : 0.8990
## Pos Pred Value : 0.8333
## Neg Pred Value : 0.8476
## Prevalence : 0.4000
## Detection Rate : 0.3030
## Detection Prevalence : 0.3636
## Balanced Accuracy : 0.8283
##
## 'Positive' Class : CN
##
cm_FeatEval_Median_ENM1_Accuracy<-cm_FeatEval_Median_ENM1$overall["Accuracy"]
cm_FeatEval_Median_ENM1_Kappa<-cm_FeatEval_Median_ENM1$overall["Kappa"]
print(cm_FeatEval_Median_ENM1_Accuracy)
## Accuracy
## 0.8424242
print(cm_FeatEval_Median_ENM1_Kappa)
## Kappa
## 0.6666667
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## PC2 100.00
## cg23432430 53.59
## cg20685672 52.84
## cg27272246 52.04
## cg13405878 51.28
## cg00086247 50.82
## cg16652920 50.26
## cg03924089 48.78
## cg02981548 48.51
## cg02225060 48.09
## cg09015880 47.32
## cg00962106 46.93
## cg06833284 46.57
## cg14710850 45.05
## cg14687298 43.71
## cg06634367 42.94
## cg00004073 42.21
## cg17129965 42.11
## cg17042243 41.93
## cg07028768 41.86
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==5 ||METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 1.194495912
## 2 0.643985479
## 3 0.635095854
## 4 0.625636947
## 5 0.616606636
## 6 0.611114385
## 7 0.604481348
## 8 0.586892070
## 9 0.583674069
## 10 0.578680398
## 11 0.569605531
## 12 0.564973420
## 13 0.560745791
## 14 0.542617875
## 15 0.526722156
## 16 0.517636137
## 17 0.508981327
## 18 0.507754009
## 19 0.505618034
## 20 0.504829883
## 21 0.492979134
## 22 0.492489217
## 23 0.486293218
## 24 0.480901656
## 25 0.477986076
## 26 0.473619615
## 27 0.472555975
## 28 0.468346968
## 29 0.451078664
## 30 0.446478081
## 31 0.445972285
## 32 0.444669134
## 33 0.443642512
## 34 0.443090235
## 35 0.441380706
## 36 0.440466914
## 37 0.434496758
## 38 0.427189043
## 39 0.424427167
## 40 0.419666876
## 41 0.419265286
## 42 0.412095428
## 43 0.411527438
## 44 0.405379190
## 45 0.405013378
## 46 0.398009249
## 47 0.396543549
## 48 0.395948616
## 49 0.391529699
## 50 0.389587730
## 51 0.388865597
## 52 0.386648866
## 53 0.385170278
## 54 0.384835202
## 55 0.377524776
## 56 0.372641359
## 57 0.371025312
## 58 0.367344685
## 59 0.358029677
## 60 0.355016426
## 61 0.353845452
## 62 0.353270324
## 63 0.352770899
## 64 0.352605259
## 65 0.346940930
## 66 0.342788929
## 67 0.340394930
## 68 0.337621775
## 69 0.337148049
## 70 0.335803706
## 71 0.332755765
## 72 0.330321772
## 73 0.329884276
## 74 0.328498272
## 75 0.328235918
## 76 0.324173601
## 77 0.322917469
## 78 0.322679782
## 79 0.315717519
## 80 0.314290437
## 81 0.313690894
## 82 0.313543439
## 83 0.312756604
## 84 0.312518308
## 85 0.312288660
## 86 0.309918833
## 87 0.308140565
## 88 0.307824023
## 89 0.305662368
## 90 0.299441074
## 91 0.298373446
## 92 0.297780592
## 93 0.297369872
## 94 0.297074602
## 95 0.295046356
## 96 0.292367989
## 97 0.291375289
## 98 0.290910015
## 99 0.289989099
## 100 0.289979843
## 101 0.289072545
## 102 0.287520063
## 103 0.286943250
## 104 0.286903438
## 105 0.286446527
## 106 0.284987649
## 107 0.284820324
## 108 0.284738587
## 109 0.284192511
## 110 0.283705568
## 111 0.282158507
## 112 0.281207177
## 113 0.279836667
## 114 0.277760032
## 115 0.277365569
## 116 0.276291334
## 117 0.275640717
## 118 0.274704634
## 119 0.273920779
## 120 0.273567096
## 121 0.273459059
## 122 0.273258566
## 123 0.272615475
## 124 0.271641336
## 125 0.270926894
## 126 0.270683748
## 127 0.269234399
## 128 0.267191142
## 129 0.267104309
## 130 0.266595508
## 131 0.264936234
## 132 0.263520913
## 133 0.262726346
## 134 0.259531240
## 135 0.259035871
## 136 0.258681543
## 137 0.257610901
## 138 0.256061360
## 139 0.255286721
## 140 0.254518137
## 141 0.252865048
## 142 0.252388216
## 143 0.251669735
## 144 0.250715297
## 145 0.249755785
## 146 0.249249859
## 147 0.249049609
## 148 0.248184538
## 149 0.247892632
## 150 0.247521375
## 151 0.247468594
## 152 0.246856051
## 153 0.246643378
## 154 0.245045191
## 155 0.243456035
## 156 0.242791052
## 157 0.242056041
## 158 0.241493002
## 159 0.238896770
## 160 0.234519104
## 161 0.233095748
## 162 0.229877686
## 163 0.229419403
## 164 0.228223778
## 165 0.227194078
## 166 0.225942697
## 167 0.225140925
## 168 0.225076409
## 169 0.224664550
## 170 0.220811438
## 171 0.218819819
## 172 0.216591997
## 173 0.212924301
## 174 0.212805135
## 175 0.208839044
## 176 0.207829899
## 177 0.204120060
## 178 0.203862964
## 179 0.203541904
## 180 0.202403145
## 181 0.200763861
## 182 0.200466673
## 183 0.200408359
## 184 0.200170342
## 185 0.197186117
## 186 0.193287695
## 187 0.191972016
## 188 0.191511118
## 189 0.190450238
## 190 0.190134612
## 191 0.188129778
## 192 0.187204198
## 193 0.186005847
## 194 0.182830474
## 195 0.182187335
## 196 0.180743696
## 197 0.179788812
## 198 0.179787036
## 199 0.178952278
## 200 0.178384397
## 201 0.177804930
## 202 0.176202353
## 203 0.175948168
## 204 0.175411401
## 205 0.172572419
## 206 0.170211478
## 207 0.170038833
## 208 0.169831974
## 209 0.168290275
## 210 0.168191975
## 211 0.167605927
## 212 0.167146520
## 213 0.164645085
## 214 0.163333142
## 215 0.162307176
## 216 0.160691740
## 217 0.160362829
## 218 0.158382328
## 219 0.157288101
## 220 0.155245100
## 221 0.155176774
## 222 0.154734097
## 223 0.152950678
## 224 0.152165547
## 225 0.148853058
## 226 0.145782234
## 227 0.145118585
## 228 0.143710180
## 229 0.141862275
## 230 0.139273150
## 231 0.138813236
## 232 0.136528830
## 233 0.133086442
## 234 0.132549154
## 235 0.131171539
## 236 0.129274763
## 237 0.128775211
## 238 0.114716684
## 239 0.110467715
## 240 0.108538399
## 241 0.094499636
## 242 0.094083560
## 243 0.089811356
## 244 0.068566678
## 245 0.056003133
## 246 0.047573909
## 247 0.039924249
## 248 0.026350681
## 249 0.011004446
## 250 0.008266977
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.9013
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_ENM1_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_ENM1_AUC <- mean_auc
}
print(FeatEval_Median_ENM1_AUC)
## Area under the curve: 0.9013
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.6606061 0.2803707
## 0.3 1 0.6 0.50 100 0.6758242 0.3175748
## 0.3 1 0.6 0.50 150 0.6965368 0.3614427
## 0.3 1 0.6 0.75 50 0.6527806 0.2508068
## 0.3 1 0.6 0.75 100 0.6708625 0.2983487
## 0.3 1 0.6 0.75 150 0.6759907 0.3140812
## 0.3 1 0.6 1.00 50 0.6297702 0.1906789
## 0.3 1 0.6 1.00 100 0.6296703 0.2038093
## 0.3 1 0.6 1.00 150 0.6682651 0.2914417
## 0.3 1 0.8 0.50 50 0.6297702 0.1969067
## 0.3 1 0.8 0.50 100 0.6298035 0.2157083
## 0.3 1 0.8 0.50 150 0.6889444 0.3411921
## 0.3 1 0.8 0.75 50 0.6194805 0.1940651
## 0.3 1 0.8 0.75 100 0.6707959 0.3023798
## 0.3 1 0.8 0.75 150 0.6657676 0.2877651
## 0.3 1 0.8 1.00 50 0.6272061 0.1820450
## 0.3 1 0.8 1.00 100 0.6426240 0.2359577
## 0.3 1 0.8 1.00 150 0.6503164 0.2461663
## 0.3 2 0.6 0.50 50 0.6398601 0.2137698
## 0.3 2 0.6 0.50 100 0.6836497 0.3149948
## 0.3 2 0.6 0.50 150 0.6990676 0.3519330
## 0.3 2 0.6 0.75 50 0.6477855 0.2306306
## 0.3 2 0.6 0.75 100 0.6965368 0.3439600
## 0.3 2 0.6 0.75 150 0.6991675 0.3480526
## 0.3 2 0.6 1.00 50 0.6451215 0.2354112
## 0.3 2 0.6 1.00 100 0.6708292 0.2857647
## 0.3 2 0.6 1.00 150 0.6837496 0.3204128
## 0.3 2 0.8 0.50 50 0.6682651 0.2748980
## 0.3 2 0.8 0.50 100 0.7094239 0.3712118
## 0.3 2 0.8 0.50 150 0.7196137 0.3942586
## 0.3 2 0.8 0.75 50 0.6659674 0.2784530
## 0.3 2 0.8 0.75 100 0.6942724 0.3351821
## 0.3 2 0.8 0.75 150 0.7044955 0.3602016
## 0.3 2 0.8 1.00 50 0.6580420 0.2534280
## 0.3 2 0.8 1.00 100 0.6915418 0.3359422
## 0.3 2 0.8 1.00 150 0.6811855 0.3145601
## 0.3 3 0.6 0.50 50 0.6964369 0.3457440
## 0.3 3 0.6 0.50 100 0.7272394 0.4134138
## 0.3 3 0.6 0.50 150 0.7426906 0.4457859
## 0.3 3 0.6 0.75 50 0.6811189 0.3067629
## 0.3 3 0.6 0.75 100 0.6861805 0.3137327
## 0.3 3 0.6 0.75 150 0.7016650 0.3493605
## 0.3 3 0.6 1.00 50 0.6735265 0.2813565
## 0.3 3 0.6 1.00 100 0.6863803 0.3074580
## 0.3 3 0.6 1.00 150 0.6889111 0.3128052
## 0.3 3 0.8 0.50 50 0.6861805 0.3216375
## 0.3 3 0.8 0.50 100 0.6809857 0.3075078
## 0.3 3 0.8 0.50 150 0.6861139 0.3228845
## 0.3 3 0.8 0.75 50 0.6582085 0.2551744
## 0.3 3 0.8 0.75 100 0.6710290 0.2829034
## 0.3 3 0.8 0.75 150 0.6812188 0.3016272
## 0.3 3 0.8 1.00 50 0.6555445 0.2532363
## 0.3 3 0.8 1.00 100 0.6554779 0.2544141
## 0.3 3 0.8 1.00 150 0.6735265 0.2972074
## 0.4 1 0.6 0.50 50 0.6348318 0.2339818
## 0.4 1 0.6 0.50 100 0.6709291 0.3072033
## 0.4 1 0.6 0.50 150 0.6863470 0.3379410
## 0.4 1 0.6 0.75 50 0.6529804 0.2728716
## 0.4 1 0.6 0.75 100 0.6708625 0.2997583
## 0.4 1 0.6 0.75 150 0.7043290 0.3755748
## 0.4 1 0.6 1.00 50 0.6194139 0.1746778
## 0.4 1 0.6 1.00 100 0.6503164 0.2468624
## 0.4 1 0.6 1.00 150 0.6681985 0.2929310
## 0.4 1 0.8 0.50 50 0.6734932 0.3042543
## 0.4 1 0.8 0.50 100 0.6966700 0.3545135
## 0.4 1 0.8 0.50 150 0.7171162 0.4007465
## 0.4 1 0.8 0.75 50 0.6246420 0.2052096
## 0.4 1 0.8 0.75 100 0.6658009 0.2971476
## 0.4 1 0.8 0.75 150 0.6709291 0.2995750
## 0.4 1 0.8 1.00 50 0.6220446 0.1779084
## 0.4 1 0.8 1.00 100 0.6451215 0.2362188
## 0.4 1 0.8 1.00 150 0.6579754 0.2779254
## 0.4 2 0.6 0.50 50 0.6655678 0.2791432
## 0.4 2 0.6 0.50 100 0.6862471 0.3221645
## 0.4 2 0.6 0.50 150 0.6811855 0.3190374
## 0.4 2 0.6 0.75 50 0.6605062 0.2799582
## 0.4 2 0.6 0.75 100 0.7142857 0.3952103
## 0.4 2 0.6 0.75 150 0.7220113 0.4098317
## 0.4 2 0.6 1.00 50 0.6245754 0.1929754
## 0.4 2 0.6 1.00 100 0.6425907 0.2321895
## 0.4 2 0.6 1.00 150 0.6605395 0.2703193
## 0.4 2 0.8 0.50 50 0.6656344 0.2890366
## 0.4 2 0.8 0.50 100 0.7067932 0.3739748
## 0.4 2 0.8 0.50 150 0.7067932 0.3794993
## 0.4 2 0.8 0.75 50 0.6372627 0.2181699
## 0.4 2 0.8 0.75 100 0.6914086 0.3405727
## 0.4 2 0.8 0.75 150 0.6862471 0.3284884
## 0.4 2 0.8 1.00 50 0.6656677 0.2791064
## 0.4 2 0.8 1.00 100 0.6760573 0.3068046
## 0.4 2 0.8 1.00 150 0.6862804 0.3268697
## 0.4 3 0.6 0.50 50 0.6477855 0.2451966
## 0.4 3 0.6 0.50 100 0.6657676 0.2886896
## 0.4 3 0.6 0.50 150 0.6811855 0.3191299
## 0.4 3 0.6 0.75 50 0.6734599 0.2893133
## 0.4 3 0.6 0.75 100 0.6785881 0.3032108
## 0.4 3 0.6 0.75 150 0.6991342 0.3477377
## 0.4 3 0.6 1.00 50 0.6451548 0.2272113
## 0.4 3 0.6 1.00 100 0.6733933 0.2888822
## 0.4 3 0.6 1.00 150 0.6708292 0.2828738
## 0.4 3 0.8 0.50 50 0.6221445 0.1878158
## 0.4 3 0.8 0.50 100 0.6426240 0.2296299
## 0.4 3 0.8 0.50 150 0.6556111 0.2594202
## 0.4 3 0.8 0.75 50 0.7068265 0.3684256
## 0.4 3 0.8 0.75 100 0.7170829 0.3947839
## 0.4 3 0.8 0.75 150 0.7299034 0.4206251
## 0.4 3 0.8 1.00 50 0.6529138 0.2423832
## 0.4 3 0.8 1.00 100 0.6682651 0.2787767
## 0.4 3 0.8 1.00 150 0.6631702 0.2772318
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.5.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6720511
FeatEval_Median_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Median_mean_accuracy_cv_xgb)
## [1] 0.6720511
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Median_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Median_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Median_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Median_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 36 21
## MCI 30 78
##
## Accuracy : 0.6909
## 95% CI : (0.6144, 0.7604)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.009814
##
## Kappa : 0.3411
##
## Mcnemar's Test P-Value : 0.262618
##
## Sensitivity : 0.5455
## Specificity : 0.7879
## Pos Pred Value : 0.6316
## Neg Pred Value : 0.7222
## Prevalence : 0.4000
## Detection Rate : 0.2182
## Detection Prevalence : 0.3455
## Balanced Accuracy : 0.6667
##
## 'Positive' Class : CN
##
cm_FeatEval_Median_xgb_Accuracy <-cm_FeatEval_Median_xgb$overall["Accuracy"]
cm_FeatEval_Median_xgb_Kappa <-cm_FeatEval_Median_xgb$overall["Kappa"]
print(cm_FeatEval_Median_xgb_Accuracy)
## Accuracy
## 0.6909091
print(cm_FeatEval_Median_xgb_Kappa)
## Kappa
## 0.3410853
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 250)
##
## Overall
## age.now 100.00
## cg10978526 78.04
## cg09584650 70.19
## cg12228670 57.20
## cg04248279 54.43
## cg13739190 51.31
## cg00696044 48.75
## cg00962106 48.39
## cg24139837 45.77
## cg01921484 45.38
## cg04971651 45.21
## cg02225060 45.19
## cg17186592 44.80
## cg02772171 43.70
## cg10240127 41.78
## cg14564293 41.44
## cg12543766 40.18
## cg00084271 38.44
## cg04462915 38.17
## cg23432430 37.58
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: age.now 3.277355e-02 0.0283789674 0.019342360 3.277355e-02
## 2: cg10978526 2.557485e-02 0.0148559741 0.009671180 2.557485e-02
## 3: cg09584650 2.300364e-02 0.0204734458 0.005802708 2.300364e-02
## 4: cg12228670 1.874521e-02 0.0120612732 0.005802708 1.874521e-02
## 5: cg04248279 1.783834e-02 0.0114068159 0.011605416 1.783834e-02
## ---
## 209: cg09289202 8.733792e-05 0.0005367707 0.001934236 8.733792e-05
## 210: cg11540596 8.682417e-05 0.0005966699 0.001934236 8.682417e-05
## 211: cg08198851 3.173766e-05 0.0004563947 0.001934236 3.173766e-05
## 212: cg03395511 3.079788e-05 0.0005081172 0.001934236 3.079788e-05
## 213: cg01023242 1.740397e-05 0.0007212398 0.001934236 1.740397e-05
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7634
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_xgb_AUC <-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_xgb_AUC <-mean_auc
}
print(FeatEval_Median_xgb_AUC)
## Area under the curve: 0.7634
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6169497 0.04597915
## 126 0.6554446 0.17684788
## 250 0.6631702 0.19453084
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 250.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6451881
FeatEval_Median_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Median_mean_accuracy_cv_rf)
## [1] 0.6451881
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Median_rf_trainAccuracy<-train_accuracy
print(FeatEval_Median_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Median_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Median_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 15 5
## MCI 51 94
##
## Accuracy : 0.6606
## 95% CI : (0.5829, 0.7324)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.06455
##
## Kappa : 0.2
##
## Mcnemar's Test P-Value : 1.817e-09
##
## Sensitivity : 0.22727
## Specificity : 0.94949
## Pos Pred Value : 0.75000
## Neg Pred Value : 0.64828
## Prevalence : 0.40000
## Detection Rate : 0.09091
## Detection Prevalence : 0.12121
## Balanced Accuracy : 0.58838
##
## 'Positive' Class : CN
##
cm_FeatEval_Median_rf_Accuracy<-cm_FeatEval_Median_rf$overall["Accuracy"]
print(cm_FeatEval_Median_rf_Accuracy)
## Accuracy
## 0.6606061
cm_FeatEval_Median_rf_Kappa<-cm_FeatEval_Median_rf$overall["Kappa"]
print(cm_FeatEval_Median_rf_Kappa)
## Kappa
## 0.2
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 250)
##
## Importance
## age.now 100.00
## cg06286533 77.88
## cg05234269 74.40
## cg00696044 73.40
## cg01153376 71.47
## cg10666341 68.78
## cg20685672 67.60
## cg16771215 66.87
## cg03924089 64.78
## cg23066280 64.36
## cg01128042 64.32
## cg05392160 63.55
## cg09289202 63.46
## cg15586958 61.24
## cg05161773 60.47
## cg27160885 60.13
## cg12953206 59.55
## cg26983017 59.48
## cg16655091 59.45
## cg22112152 58.97
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if(METHOD_FEATURE_FLAG==5 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
## CN MCI
## 1 4.245623348 4.245623348
## 2 2.764294008 2.764294008
## 3 2.530866177 2.530866177
## 4 2.464297621 2.464297621
## 5 2.335185954 2.335185954
## 6 2.154987375 2.154987375
## 7 2.075937352 2.075937352
## 8 2.026869205 2.026869205
## 9 1.887227955 1.887227955
## 10 1.858654714 1.858654714
## 11 1.856184698 1.856184698
## 12 1.804259568 1.804259568
## 13 1.798716109 1.798716109
## 14 1.650017790 1.650017790
## 15 1.598180190 1.598180190
## 16 1.575616239 1.575616239
## 17 1.536576034 1.536576034
## 18 1.532300425 1.532300425
## 19 1.530241242 1.530241242
## 20 1.497535684 1.497535684
## 21 1.494673056 1.494673056
## 22 1.488385555 1.488385555
## 23 1.466194194 1.466194194
## 24 1.441981233 1.441981233
## 25 1.429830019 1.429830019
## 26 1.414677707 1.414677707
## 27 1.344047189 1.344047189
## 28 1.289881550 1.289881550
## 29 1.281517084 1.281517084
## 30 1.280571752 1.280571752
## 31 1.275920877 1.275920877
## 32 1.263077493 1.263077493
## 33 1.238817484 1.238817484
## 34 1.231682381 1.231682381
## 35 1.220000842 1.220000842
## 36 1.196478141 1.196478141
## 37 1.178438738 1.178438738
## 38 1.173187080 1.173187080
## 39 1.166799120 1.166799120
## 40 1.161123079 1.161123079
## 41 1.132677613 1.132677613
## 42 1.131339906 1.131339906
## 43 1.112805610 1.112805610
## 44 1.101023938 1.101023938
## 45 1.075318846 1.075318846
## 46 1.043922162 1.043922162
## 47 1.020101416 1.020101416
## 48 1.017832681 1.017832681
## 49 1.014846496 1.014846496
## 50 1.011065475 1.011065475
## 51 0.996035614 0.996035614
## 52 0.985210108 0.985210108
## 53 0.975690969 0.975690969
## 54 0.960854130 0.960854130
## 55 0.936276239 0.936276239
## 56 0.923601434 0.923601434
## 57 0.911711460 0.911711460
## 58 0.908848127 0.908848127
## 59 0.815709061 0.815709061
## 60 0.810816548 0.810816548
## 61 0.804341643 0.804341643
## 62 0.796343720 0.796343720
## 63 0.791580559 0.791580559
## 64 0.789878561 0.789878561
## 65 0.786282102 0.786282102
## 66 0.755281048 0.755281048
## 67 0.743434499 0.743434499
## 68 0.738855286 0.738855286
## 69 0.736655964 0.736655964
## 70 0.724182813 0.724182813
## 71 0.716339500 0.716339500
## 72 0.711892384 0.711892384
## 73 0.697860351 0.697860351
## 74 0.648741634 0.648741634
## 75 0.613936590 0.613936590
## 76 0.587827331 0.587827331
## 77 0.572260638 0.572260638
## 78 0.556692108 0.556692108
## 79 0.556550141 0.556550141
## 80 0.556490328 0.556490328
## 81 0.539372062 0.539372062
## 82 0.534923045 0.534923045
## 83 0.529703036 0.529703036
## 84 0.528682341 0.528682341
## 85 0.512408798 0.512408798
## 86 0.509417729 0.509417729
## 87 0.507772069 0.507772069
## 88 0.502189513 0.502189513
## 89 0.498003846 0.498003846
## 90 0.497596873 0.497596873
## 91 0.479827605 0.479827605
## 92 0.471142767 0.471142767
## 93 0.451820731 0.451820731
## 94 0.423900750 0.423900750
## 95 0.386900414 0.386900414
## 96 0.371772922 0.371772922
## 97 0.369389253 0.369389253
## 98 0.345267074 0.345267074
## 99 0.342045829 0.342045829
## 100 0.340424281 0.340424281
## 101 0.332442986 0.332442986
## 102 0.323847663 0.323847663
## 103 0.303687970 0.303687970
## 104 0.300591254 0.300591254
## 105 0.286174166 0.286174166
## 106 0.282099122 0.282099122
## 107 0.281029235 0.281029235
## 108 0.278122653 0.278122653
## 109 0.276136951 0.276136951
## 110 0.269115118 0.269115118
## 111 0.243828853 0.243828853
## 112 0.237544635 0.237544635
## 113 0.236917256 0.236917256
## 114 0.232023728 0.232023728
## 115 0.228583937 0.228583937
## 116 0.225751281 0.225751281
## 117 0.224043169 0.224043169
## 118 0.220345724 0.220345724
## 119 0.216388209 0.216388209
## 120 0.214887795 0.214887795
## 121 0.208451211 0.208451211
## 122 0.206088810 0.206088810
## 123 0.174767796 0.174767796
## 124 0.165109573 0.165109573
## 125 0.154597612 0.154597612
## 126 0.153153917 0.153153917
## 127 0.145819232 0.145819232
## 128 0.131446089 0.131446089
## 129 0.115601234 0.115601234
## 130 0.110489235 0.110489235
## 131 0.098048379 0.098048379
## 132 0.097213899 0.097213899
## 133 0.070618465 0.070618465
## 134 0.052297469 0.052297469
## 135 0.035461300 0.035461300
## 136 0.032927450 0.032927450
## 137 0.024885696 0.024885696
## 138 0.020120480 0.020120480
## 139 0.018329793 0.018329793
## 140 0.013917418 0.013917418
## 141 0.013328110 0.013328110
## 142 0.009947793 0.009947793
## 143 0.007750636 0.007750636
## 144 -0.005546143 -0.005546143
## 145 -0.013048907 -0.013048907
## 146 -0.025928204 -0.025928204
## 147 -0.056419560 -0.056419560
## 148 -0.063313243 -0.063313243
## 149 -0.066130479 -0.066130479
## 150 -0.074007883 -0.074007883
## 151 -0.075109689 -0.075109689
## 152 -0.079650459 -0.079650459
## 153 -0.084663685 -0.084663685
## 154 -0.093390237 -0.093390237
## 155 -0.095896495 -0.095896495
## 156 -0.098977069 -0.098977069
## 157 -0.101100708 -0.101100708
## 158 -0.116006920 -0.116006920
## 159 -0.120034675 -0.120034675
## 160 -0.129370117 -0.129370117
## 161 -0.139399087 -0.139399087
## 162 -0.141935019 -0.141935019
## 163 -0.158984577 -0.158984577
## 164 -0.159789455 -0.159789455
## 165 -0.164540328 -0.164540328
## 166 -0.173994965 -0.173994965
## 167 -0.175219862 -0.175219862
## 168 -0.190535902 -0.190535902
## 169 -0.192482098 -0.192482098
## 170 -0.195574767 -0.195574767
## 171 -0.202013515 -0.202013515
## 172 -0.226853724 -0.226853724
## 173 -0.235621549 -0.235621549
## 174 -0.242657163 -0.242657163
## 175 -0.245283158 -0.245283158
## 176 -0.249585578 -0.249585578
## 177 -0.251412753 -0.251412753
## 178 -0.273130529 -0.273130529
## 179 -0.293625155 -0.293625155
## 180 -0.299143643 -0.299143643
## 181 -0.305690023 -0.305690023
## 182 -0.309183275 -0.309183275
## 183 -0.313579176 -0.313579176
## 184 -0.334401109 -0.334401109
## 185 -0.337775836 -0.337775836
## 186 -0.352286544 -0.352286544
## 187 -0.352636964 -0.352636964
## 188 -0.389888882 -0.389888882
## 189 -0.416277074 -0.416277074
## 190 -0.426811104 -0.426811104
## 191 -0.427308840 -0.427308840
## 192 -0.457470603 -0.457470603
## 193 -0.460012258 -0.460012258
## 194 -0.460035902 -0.460035902
## 195 -0.484141306 -0.484141306
## 196 -0.485296354 -0.485296354
## 197 -0.516437700 -0.516437700
## 198 -0.524045416 -0.524045416
## 199 -0.528868674 -0.528868674
## 200 -0.530579775 -0.530579775
## 201 -0.531796284 -0.531796284
## 202 -0.554207054 -0.554207054
## 203 -0.554215375 -0.554215375
## 204 -0.557083295 -0.557083295
## 205 -0.557129687 -0.557129687
## 206 -0.562463510 -0.562463510
## 207 -0.568099794 -0.568099794
## 208 -0.575303936 -0.575303936
## 209 -0.579710414 -0.579710414
## 210 -0.581435192 -0.581435192
## 211 -0.613678357 -0.613678357
## 212 -0.617167915 -0.617167915
## 213 -0.623354579 -0.623354579
## 214 -0.659484269 -0.659484269
## 215 -0.666931335 -0.666931335
## 216 -0.676392204 -0.676392204
## 217 -0.685416093 -0.685416093
## 218 -0.691571638 -0.691571638
## 219 -0.707127989 -0.707127989
## 220 -0.711431331 -0.711431331
## 221 -0.713947450 -0.713947450
## 222 -0.727596050 -0.727596050
## 223 -0.732424006 -0.732424006
## 224 -0.761275140 -0.761275140
## 225 -0.832498171 -0.832498171
## 226 -0.844438837 -0.844438837
## 227 -0.847094710 -0.847094710
## 228 -0.984704983 -0.984704983
## 229 -0.986014019 -0.986014019
## 230 -0.994108061 -0.994108061
## 231 -1.037000019 -1.037000019
## 232 -1.055437945 -1.055437945
## 233 -1.061117375 -1.061117375
## 234 -1.087425840 -1.087425840
## 235 -1.096650965 -1.096650965
## 236 -1.146996802 -1.146996802
## 237 -1.185703390 -1.185703390
## 238 -1.225233915 -1.225233915
## 239 -1.239115689 -1.239115689
## 240 -1.279779126 -1.279779126
## 241 -1.304617184 -1.304617184
## 242 -1.360984299 -1.360984299
## 243 -1.409311534 -1.409311534
## 244 -1.445999400 -1.445999400
## 245 -1.664827434 -1.664827434
## 246 -1.784451912 -1.784451912
## 247 -1.866798836 -1.866798836
## 248 -1.894369640 -1.894369640
## 249 -2.041597444 -2.041597444
## 250 -2.451362261 -2.451362261
if(METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==6 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==3 ){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7715
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Median_rf_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_rf_AUC<-mean_auc
}
print(FeatEval_Median_rf_AUC)
## Area under the curve: 0.7715
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 389 samples
## 250 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 311, 312, 311, 311
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8405261 0.6786741
## 0.50 0.8405261 0.6786741
## 1.00 0.8534466 0.6939272
##
## Tuning parameter 'sigma' was held constant at a value of 0.002031332
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.002031332 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.002031332 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8448329
FeatEval_Median_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Median_mean_accuracy_cv_svm)
## [1] 0.8448329
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.992287917737789"
FeatEval_Median_svm_trainAccuracy <- train_accuracy
print(FeatEval_Median_svm_trainAccuracy)
## [1] 0.9922879
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Median_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Median_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 56 16
## MCI 10 83
##
## Accuracy : 0.8424
## 95% CI : (0.7777, 0.8944)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.263e-11
##
## Kappa : 0.6766
##
## Mcnemar's Test P-Value : 0.3268
##
## Sensitivity : 0.8485
## Specificity : 0.8384
## Pos Pred Value : 0.7778
## Neg Pred Value : 0.8925
## Prevalence : 0.4000
## Detection Rate : 0.3394
## Detection Prevalence : 0.4364
## Balanced Accuracy : 0.8434
##
## 'Positive' Class : CN
##
cm_FeatEval_Median_svm_Accuracy <- cm_FeatEval_Median_svm$overall["Accuracy"]
cm_FeatEval_Median_svm_Kappa <- cm_FeatEval_Median_svm$overall["Kappa"]
print(cm_FeatEval_Median_svm_Accuracy)
## Accuracy
## 0.8424242
print(cm_FeatEval_Median_svm_Kappa)
## Kappa
## 0.6766169
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 554 rows and 251 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg02495179 0.8965517 1.068966 1.124138 0.05595668
## 2 cg17129965 0.9793103 1.034483 1.062069 0.05415162
## 3 cg11169344 0.9379310 1.034483 1.034483 0.05415162
## 4 cg04971651 0.9724138 1.034483 1.034483 0.05415162
## 5 cg20398163 0.9793103 1.034483 1.068966 0.05415162
## 6 cg23352245 0.9655172 1.034483 1.034483 0.05415162
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (test_data_SVM1$DX MCI) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9362
## [1] "The auc vlue is:"
## Area under the curve: 0.9362
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
FeatEval_Median_svm_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Median_svm_AUC <- mean_auc
}
print(FeatEval_Median_svm_AUC )
## Area under the curve: 0.9362
Performance of the selected output features based on Frequency
processed_dataFrame<-df_process_Output_freq
processed_data<-output_Frequency_Feature
AfterProcess_FeatureName<-df_process_frequency_FeatureName
print(head(output_Frequency_Feature))
## # A tibble: 6 × 276
## DX cg27272246 cg14710850 cg00004073 cg13405878 cg02981548 cg20685672 cg08788093 cg03924089 cg16652920 cg24433124 cg12543766 cg14687298 cg17129965 cg06833284 cg11169344 cg14168080 cg26081710
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MCI 0.862 0.805 0.0293 0.455 0.134 0.671 0.0391 0.792 0.944 0.132 0.510 0.0421 0.897 0.913 0.672 0.419 0.875
## 2 CN 0.871 0.809 0.0279 0.786 0.522 0.793 0.609 0.737 0.943 0.599 0.887 0.148 0.881 0.900 0.822 0.442 0.920
## 3 CN 0.810 0.829 0.646 0.758 0.510 0.661 0.884 0.851 0.946 0.819 0.0282 0.243 0.886 0.610 0.594 0.436 0.880
## 4 MCI 0.769 0.850 0.412 0.448 0.568 0.0829 0.522 0.869 0.953 0.592 0.818 0.513 0.874 0.0381 0.868 0.946 0.917
## 5 CN 0.440 0.821 0.393 0.340 0.508 0.845 0.434 0.748 0.949 0.574 0.457 0.0362 0.882 0.915 0.155 0.399 0.923
## 6 MCI 0.750 0.845 0.404 0.734 0.530 0.657 0.773 0.753 0.949 0.606 0.804 0.241 0.776 0.901 0.623 0.950 0.882
## # ℹ 258 more variables: cg02631626 <dbl>, cg17042243 <dbl>, cg08861434 <dbl>, cg07640670 <dbl>, cg20398163 <dbl>, cg03979311 <dbl>, cg10978526 <dbl>, cg22933800 <dbl>, cg07028768 <dbl>,
## # cg04248279 <dbl>, cg07504457 <dbl>, cg14175932 <dbl>, cg26219488 <dbl>, cg06231502 <dbl>, cg06115838 <dbl>, cg06483046 <dbl>, cg07104639 <dbl>, cg21392220 <dbl>, cg18819889 <dbl>,
## # cg08779649 <dbl>, cg08198851 <dbl>, cg17186592 <dbl>, cg06546677 <dbl>, cg26705599 <dbl>, cg23517115 <dbl>, cg00819121 <dbl>, cg11268585 <dbl>, cg13815695 <dbl>, cg06286533 <dbl>,
## # cg23352245 <dbl>, cg26679884 <dbl>, cg18816397 <dbl>, cg15633912 <dbl>, cg10681981 <dbl>, cg01128042 <dbl>, cg02772171 <dbl>, cg06394820 <dbl>, cg17738613 <dbl>, cg04242342 <dbl>,
## # cg22741595 <dbl>, cg02887598 <dbl>, cg22071943 <dbl>, cg04971651 <dbl>, cg19097407 <dbl>, cg15586958 <dbl>, cg08857872 <dbl>, cg02621446 <dbl>, cg00553601 <dbl>, cg00767423 <dbl>,
## # cg18285382 <dbl>, cg15098922 <dbl>, cg15138543 <dbl>, cg08745107 <dbl>, cg00696044 <dbl>, cg03749159 <dbl>, cg21415084 <dbl>, cg05155812 <dbl>, cg22112152 <dbl>, cg14293999 <dbl>,
## # cg16655091 <dbl>, cg03327352 <dbl>, cg06961873 <dbl>, cg08880261 <dbl>, cg11286989 <dbl>, cg06118351 <dbl>, cg25366315 <dbl>, cg26853071 <dbl>, cg25436480 <dbl>, cg26983017 <dbl>, …
print(df_process_frequency_FeatureName)
## [1] "cg27272246" "cg14710850" "cg00004073" "cg13405878" "cg02981548" "cg20685672" "cg08788093" "cg03924089" "cg16652920" "cg24433124" "cg12543766" "cg14687298" "cg17129965" "cg06833284" "cg11169344"
## [16] "cg14168080" "cg26081710" "cg02631626" "cg17042243" "cg08861434" "cg07640670" "cg20398163" "cg03979311" "cg10978526" "cg22933800" "cg07028768" "cg04248279" "cg07504457" "cg14175932" "cg26219488"
## [31] "cg06231502" "cg06115838" "cg06483046" "cg07104639" "cg21392220" "cg18819889" "cg08779649" "cg08198851" "cg17186592" "cg06546677" "cg26705599" "cg23517115" "cg00819121" "cg11268585" "cg13815695"
## [46] "cg06286533" "cg23352245" "cg26679884" "cg18816397" "cg15633912" "cg10681981" "cg01128042" "cg02772171" "cg06394820" "cg17738613" "cg04242342" "cg22741595" "cg02887598" "cg22071943" "cg04971651"
## [61] "cg19097407" "cg15586958" "cg08857872" "cg02621446" "cg00553601" "cg00767423" "cg18285382" "cg15098922" "cg15138543" "cg08745107" "cg00696044" "cg03749159" "cg21415084" "cg05155812" "cg22112152"
## [76] "cg14293999" "cg16655091" "cg03327352" "cg06961873" "cg08880261" "cg11286989" "cg06118351" "cg25366315" "cg26853071" "cg25436480" "cg26983017" "cg26901661" "cg21139150" "cg10738049" "cg03129555"
## [91] "cg17018422" "cg04664583" "cg03660162" "cg15600437" "cg00322003" "cg11331837" "cg23066280" "cg11187460" "cg23159970" "cg14228103" "cg09584650" "cg04728936" "cg14507637" "cg23658987" "cg11019791"
## [106] "cg05392160" "cg04412904" "cg25306893" "cg21812850" "cg24139837" "cg03737947" "cg12953206" "cg12702014" "cg20300784" "cg17429539" "cg05234269" "cg02356645" "cg03723481" "cg14564293" "cg25598710"
## [121] "cg21507367" "cg04831745" "cg18526121" "cg04462915" "cg21783012" "cg16089727" "PC1" "cg02372404" "cg12228670" "cg05570109" "cg12784167" "cg10240127" "PC2" "cg23432430" "cg21243064"
## [136] "cg02225060" "cg07480955" "cg19471911" "cg00962106" "cg09015880" "cg07158503" "cg00086247" "cg06634367" "cg16715186" "cg23923019" "cg11438323" "cg08138245" "cg13739190" "cg10091792" "cg12146221"
## [151] "cg06403901" "cg20078646" "cg00084271" "cg18918831" "cg01921484" "cg23836570" "cg05130642" "cg19799454" "cg03395511" "cg09727210" "cg00154902" "cg22169467" "cg26069044" "cg17653352" "cg10738648"
## [166] "cg22666875" "cg17268094" "cg05876883" "cg16779438" "cg09289202" "cg22542451" "cg16771215" "cg04316537" "cg25879395" "cg18150287" "cg27160885" "cg08514194" "cg14623940" "cg22535849" "cg06715136"
## [181] "cg09247979" "cg15775217" "cg19377607" "cg15535896" "cg08669168" "cg01549082" "cg07634717" "cg12063064" "cg24883219" "cg13573375" "cg06960717" "cg05850457" "cg19301366" "cg06864789" "cg14240646"
## [196] "cg03635532" "cg18993517" "cg27577781" "cg25649515" "cg00939409" "cg07478795" "cg06371647" "cg04645024" "cg18949721" "cg09216282" "cg06697310" "cg21388339" "cg15501526" "cg27086157" "cg21209485"
## [211] "cg20208879" "cg12012426" "cg01023242" "cg11401796" "cg02246922" "cg10039445" "cg26948066" "cg05891136" "cg12776173" "cg08914944" "cg14582632" "cg05799088" "cg11540596" "cg16536985" "cg03549208"
## [226] "cg10666341" "cg01008088" "cg03600007" "cg16180556" "cg15985500" "cg03982462" "cg02550738" "cg10993865" "cg14192979" "cg07227024" "cg09708852" "cg16571124" "cg18857647" "cg16202259" "cg12421087"
## [241] "cg03796003" "cg10788927" "cg15491125" "cg06880438" "cg12501287" "cg24470466" "cg16405337" "cg00272795" "cg04875706" "cg11882358" "cg14307563" "cg06012903" "cg11227702" "age.now" "cg04718469"
## [256] "cg25208881" "cg00689685" "cg12333628" "cg11133939" "cg01933473" "cg11314779" "cg24634455" "cg05161773" "cg02464073" "cg04768387" "cg24851651" "cg01910713" "cg14649234" "cg08896901" "cg03088219"
## [271] "cg26642936" "cg01153376" "cg17061760" "cg04888234" "cg09785377"
print(length(df_process_frequency_FeatureName))
## [1] 275
Num_KeyFea_Frequency <- length(df_process_frequency_FeatureName)
print(head(df_process_Output_freq))
## DX cg27272246 cg14710850 cg00004073 cg13405878 cg02981548 cg20685672 cg08788093 cg03924089 cg16652920 cg24433124 cg12543766 cg14687298 cg17129965 cg06833284 cg11169344 cg14168080
## 200223270003_R02C01 MCI 0.8615873 0.8048592 0.02928535 0.4549662 0.1342571 0.6712101 0.03911678 0.7920449 0.9436000 0.1316610 0.51028134 0.04206702 0.8972140 0.9125144 0.6720163 0.4190123
## 200223270003_R03C01 CN 0.8705287 0.8090950 0.02787198 0.7858042 0.5220037 0.7932091 0.60934160 0.7370283 0.9431222 0.5987648 0.88741539 0.14813581 0.8806673 0.9003482 0.8215477 0.4420256
## 200223270003_R06C01 CN 0.8103777 0.8285902 0.64576463 0.7583938 0.5098965 0.6613646 0.88380243 0.8506756 0.9457161 0.8188082 0.02818501 0.24260002 0.8857237 0.6097933 0.5941114 0.4355521
## cg26081710 cg02631626 cg17042243 cg08861434 cg07640670 cg20398163 cg03979311 cg10978526 cg22933800 cg07028768 cg04248279 cg07504457 cg14175932 cg26219488 cg06231502 cg06115838
## 200223270003_R02C01 0.8751040 0.6280766 0.2502905 0.8768306 0.58296513 0.1728144 0.86644909 0.5671930 0.4830774 0.4496851 0.8534976 0.7116230 0.5746953 0.9336638 0.7784451 0.8847724
## 200223270003_R03C01 0.9198212 0.1951736 0.2933475 0.4352647 0.55225610 0.8728944 0.06199853 0.9095713 0.4142525 0.8536078 0.8458854 0.6854539 0.8779027 0.9134707 0.7964278 0.8447916
## 200223270003_R06C01 0.8801892 0.2699849 0.2725457 0.8698813 0.04058533 0.2623391 0.72615553 0.8945157 0.3956683 0.8356936 0.8332786 0.7205633 0.7288239 0.9261878 0.7706160 0.8805585
## cg06483046 cg07104639 cg21392220 cg18819889 cg08779649 cg08198851 cg17186592 cg06546677 cg26705599 cg23517115 cg00819121 cg11268585 cg13815695 cg06286533 cg23352245 cg26679884
## 200223270003_R02C01 0.04383925 0.6772717 0.8726204 0.9156157 0.44449401 0.6578905 0.9230463 0.4472216 0.8585917 0.2151144 0.9207001 0.2521544 0.9267057 0.2734841 0.9377232 0.6793815
## 200223270003_R03C01 0.50720277 0.7123879 0.8563905 0.9004455 0.45076825 0.6578186 0.8593448 0.8484609 0.8613854 0.9131440 0.9281472 0.8535791 0.6859729 0.9354924 0.9375774 0.1848705
## 200223270003_R06C01 0.89604910 0.8099688 0.8466199 0.9054439 0.04810217 0.1272153 0.8467599 0.5636023 0.4332832 0.8328364 0.9327211 0.9121931 0.6509046 0.8696546 0.5932742 0.1701734
## cg18816397 cg15633912 cg10681981 cg01128042 cg02772171 cg06394820 cg17738613 cg04242342 cg22741595 cg02887598 cg22071943 cg04971651 cg19097407 cg15586958 cg08857872 cg02621446
## 200223270003_R02C01 0.5472925 0.1605530 0.7035090 0.9113420 0.9182018 0.8513195 0.6879612 0.8206769 0.6525533 0.04020908 0.8705217 0.8902474 0.1417931 0.9058263 0.3395280 0.8731313
## 200223270003_R03C01 0.4940355 0.9333421 0.7382662 0.5328806 0.5660559 0.8695521 0.6582258 0.8167892 0.1730013 0.67073881 0.2442648 0.9219452 0.8367297 0.8957526 0.8181845 0.8095534
## 200223270003_R06C01 0.5337018 0.8737362 0.6971989 0.5222757 0.8995479 0.4415020 0.1022257 0.8040357 0.1550739 0.73408417 0.2644581 0.9035233 0.2276425 0.9121763 0.2970779 0.7511582
## cg00553601 cg00767423 cg18285382 cg15098922 cg15138543 cg08745107 cg00696044 cg03749159 cg21415084 cg05155812 cg22112152 cg14293999 cg16655091 cg03327352 cg06961873 cg08880261
## 200223270003_R02C01 0.05601299 0.9298253 0.3202927 0.9286092 0.7734778 0.02921338 0.55608424 0.9355921 0.8374415 0.4514427 0.8476101 0.2836710 0.6055295 0.8851712 0.5335591 0.40655904
## 200223270003_R03C01 0.58957701 0.2651854 0.2930577 0.9027517 0.2949313 0.78542320 0.07552381 0.9153921 0.8509420 0.9070932 0.8014136 0.9172023 0.7053336 0.8786878 0.5472606 0.85616966
## 200223270003_R06C01 0.62426500 0.8667808 0.8923595 0.8525611 0.2496147 0.02709928 0.79270858 0.9255807 0.8378237 0.4107396 0.7897897 0.9168166 0.8724479 0.3042310 0.9415177 0.03280808
## cg11286989 cg06118351 cg25366315 cg26853071 cg25436480 cg26983017 cg26901661 cg21139150 cg10738049 cg03129555 cg17018422 cg04664583 cg03660162 cg15600437 cg00322003 cg11331837
## 200223270003_R02C01 0.7590008 0.3633940 0.9182318 0.4233820 0.8425160 0.89868232 0.8951971 0.01853264 0.5441211 0.6079616 0.5262747 0.5572814 0.8691767 0.4885353 0.1759911 0.03692842
## 200223270003_R03C01 0.8533989 0.4714860 0.9209800 0.7451354 0.4994032 0.03145466 0.8754981 0.43223243 0.5232715 0.5785498 0.9029604 0.5881190 0.5160770 0.4894487 0.5702070 0.57150125
## 200223270003_R06C01 0.7313884 0.8655962 0.8972984 0.4228079 0.3494312 0.84677625 0.9021064 0.43772680 0.4875473 0.9137818 0.5100750 0.9352717 0.9026304 0.8551374 0.3077122 0.03182862
## cg23066280 cg11187460 cg23159970 cg14228103 cg09584650 cg04728936 cg14507637 cg23658987 cg11019791 cg05392160 cg04412904 cg25306893 cg21812850 cg24139837 cg03737947 cg12953206
## 200223270003_R02C01 0.07247841 0.03672179 0.61817246 0.9141064 0.08230254 0.2172057 0.9051258 0.79757644 0.8112324 0.9328933 0.05088595 0.6265392 0.7920645 0.07404605 0.91824910 0.2364836
## 200223270003_R03C01 0.57174588 0.92516409 0.57492600 0.8591302 0.09661586 0.1925451 0.9009460 0.07511718 0.7831231 0.2576881 0.07717659 0.8330282 0.7688711 0.04183445 0.92067153 0.2338141
## 200223270003_R06C01 0.80814756 0.03109553 0.03288909 0.1834348 0.52399749 0.2379376 0.9013686 0.10177571 0.4353250 0.8920726 0.08253743 0.6175380 0.7702792 0.05657120 0.03638091 0.6638030
## cg12702014 cg20300784 cg17429539 cg05234269 cg02356645 cg03723481 cg14564293 cg25598710 cg21507367 cg04831745 cg18526121 cg04462915 cg21783012 cg16089727 PC1 cg02372404
## 200223270003_R02C01 0.7704049 0.86585964 0.7860900 0.93848584 0.5105903 0.4347333 0.52089591 0.3105752 0.9268560 0.61984995 0.4519781 0.03224861 0.9142369 0.86748697 -0.214185447 0.03598249
## 200223270003_R03C01 0.7848681 0.86609999 0.7100923 0.57461229 0.5833923 0.9007774 0.04000662 0.3088142 0.9290102 0.71214149 0.4762313 0.50740695 0.6694884 0.54996692 -0.172761185 0.02767285
## 200223270003_R06C01 0.8065993 0.03091187 0.7660838 0.02467208 0.5701428 0.8947417 0.04959460 0.8538820 0.9039559 0.06871768 0.4833367 0.02700644 0.9070112 0.05876736 -0.003667305 0.03127855
## cg12228670 cg05570109 cg12784167 cg10240127 PC2 cg23432430 cg21243064 cg02225060 cg07480955 cg19471911 cg00962106 cg09015880 cg07158503 cg00086247 cg06634367 cg16715186
## 200223270003_R02C01 0.8632174 0.3466611 0.81503498 0.9250553 0.01470293 0.9482702 0.5191606 0.6828159 0.3874638 0.6334393 0.9124898 0.5101716 0.5777146 0.1761275 0.8695793 0.2742789
## 200223270003_R03C01 0.8496212 0.5866750 0.02811410 0.9403255 0.05745834 0.9455418 0.9167649 0.8265195 0.3916889 0.8437175 0.5375751 0.8402106 0.6203543 0.2045043 0.9512930 0.7946153
## 200223270003_R06C01 0.8738949 0.4046471 0.03073269 0.9056974 0.08372861 0.9418716 0.4862205 0.5209552 0.4043390 0.6127952 0.5040948 0.8472063 0.6236025 0.6901217 0.9544163 0.8124316
## cg23923019 cg11438323 cg08138245 cg13739190 cg10091792 cg12146221 cg06403901 cg20078646 cg00084271 cg18918831 cg01921484 cg23836570 cg05130642 cg19799454 cg03395511 cg09727210
## 200223270003_R02C01 0.8555018 0.4863471 0.8115760 0.8510103 0.8670733 0.2049284 0.92790690 0.06198170 0.8103611 0.4891660 0.9098550 0.58688450 0.8575504 0.9178930 0.4491605 0.4240111
## 200223270003_R03C01 0.3058914 0.8984559 0.1109940 0.8358482 0.5864221 0.1814927 0.04783341 0.89537412 0.7877006 0.5333801 0.9093137 0.54259383 0.8644077 0.9106247 0.4835967 0.8812928
## 200223270003_R06C01 0.8108207 0.8722772 0.7444698 0.8419471 0.6087997 0.8619250 0.05253626 0.08725521 0.7706165 0.6406575 0.9204487 0.03267304 0.3661324 0.9066551 0.5523959 0.8493743
## cg00154902 cg22169467 cg26069044 cg17653352 cg10738648 cg22666875 cg17268094 cg05876883 cg16779438 cg09289202 cg22542451 cg16771215 cg04316537 cg25879395 cg18150287 cg27160885
## 200223270003_R02C01 0.5137741 0.3095010 0.9240187 0.9269778 0.44931577 0.8177182 0.5774753 0.9039064 0.8826150 0.4361103 0.5884356 0.88389723 0.8074830 0.88130864 0.7685695 0.2231606
## 200223270003_R03C01 0.8540746 0.2978585 0.9407223 0.9086951 0.49894016 0.8291957 0.9003262 0.9223308 0.5466924 0.4397504 0.8337068 0.07196933 0.8453340 0.02603438 0.7519166 0.8263885
## 200223270003_R06C01 0.8188126 0.8955853 0.9332131 0.9341775 0.05552024 0.3694180 0.8789368 0.4697980 0.8629492 0.4193555 0.8125084 0.09949974 0.4351695 0.91060615 0.2501173 0.2121179
## cg08514194 cg14623940 cg22535849 cg06715136 cg09247979 cg15775217 cg19377607 cg15535896 cg08669168 cg01549082 cg07634717 cg12063064 cg24883219 cg13573375 cg06960717 cg05850457
## 200223270003_R02C01 0.9128478 0.7623774 0.8847704 0.3400192 0.5070956 0.5707441 0.05377464 0.3382952 0.9226769 0.2924138 0.7483382 0.9357515 0.6430473 0.8670419 0.7030978 0.8183013
## 200223270003_R03C01 0.2613138 0.8732905 0.8609966 0.9259109 0.5706177 0.9168327 0.90570746 0.9253926 0.9164547 0.7065693 0.8254434 0.9436901 0.6822115 0.1733934 0.7653402 0.8313023
## 200223270003_R06C01 0.9202187 0.8661720 0.8808022 0.9079807 0.5090215 0.6042521 0.06636174 0.3320191 0.6362087 0.2895440 0.8181246 0.5490657 0.5296903 0.8888246 0.7206218 0.8161364
## cg19301366 cg06864789 cg14240646 cg03635532 cg18993517 cg27577781 cg25649515 cg00939409 cg07478795 cg06371647 cg04645024 cg18949721 cg09216282 cg06697310 cg21388339 cg15501526
## 200223270003_R02C01 0.8831393 0.05369415 0.5391334 0.8416733 0.2091538 0.8143535 0.9279829 0.2652180 0.8911007 0.8336894 0.7366541 0.2334245 0.9349248 0.8454609 0.2756268 0.6362531
## 200223270003_R03C01 0.8072679 0.46053125 0.2538363 0.8262538 0.2665896 0.8113185 0.9235753 0.8882671 0.9095543 0.8198684 0.8454827 0.2437792 0.9244259 0.8653044 0.2102269 0.6319253
## 200223270003_R06C01 0.8796022 0.87513655 0.1864902 0.8450480 0.2574003 0.8144274 0.5895839 0.8842646 0.8905903 0.8069537 0.0871902 0.2523095 0.9263996 0.2405168 0.7649181 0.7435100
## cg27086157 cg21209485 cg20208879 cg12012426 cg01023242 cg11401796 cg02246922 cg10039445 cg26948066 cg05891136 cg12776173 cg08914944 cg14582632 cg05799088 cg11540596 cg16536985
## 200223270003_R02C01 0.9224112 0.8865053 0.66986658 0.9165048 0.7210683 0.8453050 0.7301201 0.8833873 0.4685225 0.7797403 0.1038804 0.63423942 0.8475098 0.9023317 0.9238951 0.5789643
## 200223270003_R03C01 0.9219304 0.8714878 0.02423079 0.9434768 0.9032685 0.4319176 0.9447019 0.8954055 0.5026045 0.3310206 0.8730635 0.04392811 0.5526692 0.8779381 0.8926595 0.5418687
## 200223270003_R06C01 0.3224986 0.2292550 0.61769424 0.9220044 0.7831190 0.4370329 0.7202230 0.8832807 0.9101976 0.7965298 0.7009491 0.06893322 0.5288675 0.6887230 0.8820252 0.8392044
## cg03549208 cg10666341 cg01008088 cg03600007 cg16180556 cg15985500 cg03982462 cg02550738 cg10993865 cg14192979 cg07227024 cg09708852 cg16571124 cg18857647 cg16202259 cg12421087
## 200223270003_R02C01 0.9014487 0.9046648 0.8424817 0.5658487 0.39300141 0.8555262 0.8562777 0.6201457 0.9173768 0.06336040 0.04553128 0.2843446 0.9282854 0.8582332 0.9548726 0.5647607
## 200223270003_R03C01 0.8381784 0.6731062 0.2417656 0.6018832 0.07312155 0.8312198 0.6023731 0.9011727 0.9096170 0.06019651 0.05004286 0.2897826 0.9206431 0.8394132 0.3713483 0.5399655
## 200223270003_R06C01 0.9097817 0.6443180 0.2618620 0.8611166 0.20051805 0.8492103 0.8778458 0.9085849 0.4904519 0.52114282 0.06152206 0.8896436 0.9276842 0.2647491 0.4852461 0.5400348
## cg03796003 cg10788927 cg15491125 cg06880438 cg12501287 cg24470466 cg16405337 cg00272795 cg04875706 cg11882358 cg14307563 cg06012903 cg11227702 age.now cg04718469 cg25208881
## 200223270003_R02C01 0.89227099 0.8973154 0.9066635 0.8285145 0.4654925 0.7725300 0.6177291 0.46365138 0.5790542 0.89136326 0.1855966 0.7964595 0.86486075 82.4 0.8687522 0.1851956
## 200223270003_R03C01 0.86011668 0.2021398 0.3850991 0.7988881 0.5126917 0.9041432 0.6131717 0.82839260 0.9255066 0.04943344 0.8916957 0.1933431 0.49184121 78.6 0.7256813 0.9092286
## 200223270003_R06C01 0.08518098 0.2053075 0.9091504 0.7839538 0.9189144 0.1206738 0.6098664 0.07231279 0.9155843 0.80176322 0.8750052 0.1960773 0.02543724 80.4 0.8521881 0.9265502
## cg00689685 cg12333628 cg11133939 cg01933473 cg11314779 cg24634455 cg05161773 cg02464073 cg04768387 cg24851651 cg01910713 cg14649234 cg08896901 cg03088219 cg26642936 cg01153376
## 200223270003_R02C01 0.7019389 0.9227884 0.1282694 0.2589014 0.0242134 0.7796391 0.4120912 0.4842537 0.3131047 0.03674702 0.8573169 0.05165754 0.3581911 0.844002862 0.7619266 0.4872148
## 200223270003_R03C01 0.8634268 0.9092861 0.5920898 0.6726133 0.8966100 0.5188241 0.4154907 0.4998933 0.9465814 0.05358297 0.8538850 0.79015014 0.2467071 0.007435243 0.7023413 0.9639670
## 200223270003_R06C01 0.6378795 0.5084647 0.5127706 0.2642560 0.8908661 0.5325725 0.8526849 0.9077933 0.9098563 0.05968923 0.8110366 0.65413166 0.9225209 0.120155222 0.7099380 0.2242410
## cg17061760 cg04888234 cg09785377
## 200223270003_R02C01 0.08726914 0.8379655 0.9162088
## 200223270003_R03C01 0.59377488 0.4376314 0.9226292
## 200223270003_R06C01 0.83354475 0.8039047 0.6405193
## [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
df_LRM1<-processed_data
featureName_LRM1<-AfterProcess_FeatureName
library(glmnet)
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_LRM1$DX, p = 0.7, list = FALSE)
trainData <- df_LRM1[trainIndex, ]
testData <- df_LRM1[-trainIndex, ]
dim(trainData)
## [1] 389 276
dim(testData)
## [1] 165 276
ctrl <- trainControl(method = "cv", number = 5)
model_LRM1 <- caret::train(DX ~ ., data = trainData, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM1, newdata = testData,type="raw")
cm_FeatEval_Freq_LRM1<-caret::confusionMatrix(predictions, testData$DX)
print(cm_FeatEval_Freq_LRM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 53 12
## MCI 13 87
##
## Accuracy : 0.8485
## 95% CI : (0.7845, 0.8995)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 3.459e-12
##
## Kappa : 0.6835
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.8030
## Specificity : 0.8788
## Pos Pred Value : 0.8154
## Neg Pred Value : 0.8700
## Prevalence : 0.4000
## Detection Rate : 0.3212
## Detection Prevalence : 0.3939
## Balanced Accuracy : 0.8409
##
## 'Positive' Class : CN
##
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
cm_FeatEval_Freq_LRM1_Accuracy <- cm_FeatEval_Freq_LRM1$overall["Accuracy"]
cm_FeatEval_Freq_LRM1_Kappa <- cm_FeatEval_Freq_LRM1$overall["Kappa"]
print(cm_FeatEval_Freq_LRM1_Accuracy)
## Accuracy
## 0.8484848
print(cm_FeatEval_Freq_LRM1_Kappa)
## Kappa
## 0.6835443
print(model_LRM1)
## glmnet
##
## 389 samples
## 275 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001780646 0.8688312 0.7232240
## 0.10 0.0017806455 0.8713953 0.7279783
## 0.10 0.0178064554 0.8559441 0.6929420
## 0.55 0.0001780646 0.8201132 0.6188150
## 0.55 0.0017806455 0.7995005 0.5732916
## 0.55 0.0178064554 0.7455544 0.4571745
## 1.00 0.0001780646 0.7788545 0.5303598
## 1.00 0.0017806455 0.7609058 0.4923915
## 1.00 0.0178064554 0.7224109 0.4072839
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.001780646.
train_predictions <- predict(model_LRM1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
FeatEval_Freq_LRM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Freq_LRM1_trainAccuracy)
## [1] 1
mean_accuracy_model_LRM1 <- mean(model_LRM1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM1)
## [1] 0.8026122
FeatEval_Freq_mean_accuracy_cv_LRM1 <- mean_accuracy_model_LRM1
print(FeatEval_Freq_mean_accuracy_cv_LRM1)
## [1] 0.8026122
library(caret)
library(pROC)
if (METHOD_FEATURE_FLAG ==5){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.9013
## [1] "The auc value is:"
## Area under the curve: 0.9013
if (METHOD_FEATURE_FLAG ==4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==3){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_LRM1_AUC <- auc_value
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(model_LRM1, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_LRM1_AUC <- mean_auc
}
importance_model_LRM1 <- varImp(model_LRM1)
print(importance_model_LRM1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 275)
##
## Overall
## PC2 100.00
## cg27272246 54.01
## cg00004073 49.63
## cg13405878 48.13
## cg14710850 48.01
## cg23432430 47.80
## cg14582632 46.85
## cg03924089 46.35
## cg20685672 46.20
## cg02981548 45.41
## cg09015880 44.01
## cg08788093 43.22
## cg07480955 42.54
## cg02225060 42.52
## cg19471911 42.01
## cg17129965 41.39
## cg12543766 41.22
## cg06833284 41.02
## cg11169344 40.61
## cg00962106 40.46
plot(importance_model_LRM1, top = 20, main = "Variable Importance Plot")
importance_model_LRM1_df<-importance_model_LRM1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 ||METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6){
importance_final_model_LRM1 <- varImp(model_LRM1$finalModel)
library(dplyr)
ordered_importance_final_model_LRM1 <- importance_final_model_LRM1 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM1)
}
## Overall
## 1 5.1200392098
## 2 2.7654059793
## 3 2.5409752288
## 4 2.4644806404
## 5 2.4581149294
## 6 2.4472867134
## 7 2.3986563040
## 8 2.3733432989
## 9 2.3653696511
## 10 2.3249152354
## 11 2.2533160665
## 12 2.2129155405
## 13 2.1781140605
## 14 2.1770784065
## 15 2.1509554568
## 16 2.1190193850
## 17 2.1104781921
## 18 2.1003473899
## 19 2.0794423251
## 20 2.0713327396
## 21 2.0692335308
## 22 2.0548443398
## 23 2.0351199873
## 24 2.0103868147
## 25 1.9654746957
## 26 1.9620935575
## 27 1.9328297398
## 28 1.9184652666
## 29 1.8998874049
## 30 1.8455500813
## 31 1.8387866259
## 32 1.8029416554
## 33 1.7826369041
## 34 1.7694835858
## 35 1.7471623253
## 36 1.7280501848
## 37 1.7045293472
## 38 1.6979250181
## 39 1.6472074157
## 40 1.6269384869
## 41 1.6166736405
## 42 1.6094832785
## 43 1.6019535298
## 44 1.5288717591
## 45 1.5038596391
## 46 1.4954126266
## 47 1.4844928373
## 48 1.4613520563
## 49 1.4164797170
## 50 1.4110780268
## 51 1.3862586906
## 52 1.3784070180
## 53 1.3781788214
## 54 1.3752214382
## 55 1.3543263311
## 56 1.3501708826
## 57 1.3283983303
## 58 1.3184657513
## 59 1.3163432908
## 60 1.3128853421
## 61 1.3120972736
## 62 1.3095267118
## 63 1.3039907216
## 64 1.2856773330
## 65 1.2853744492
## 66 1.2776753486
## 67 1.2641186989
## 68 1.2610786290
## 69 1.2389407819
## 70 1.2377230036
## 71 1.2329135067
## 72 1.2267006445
## 73 1.2262983485
## 74 1.2231913548
## 75 1.1847311612
## 76 1.1763877330
## 77 1.1761472413
## 78 1.1760010302
## 79 1.1739145530
## 80 1.1721221850
## 81 1.1388328259
## 82 1.1335915203
## 83 1.1304637760
## 84 1.1264866309
## 85 1.1240359225
## 86 1.1132687505
## 87 1.0986455105
## 88 1.0949995590
## 89 1.0898318104
## 90 1.0821206387
## 91 1.0549573807
## 92 1.0471085888
## 93 1.0412121421
## 94 1.0347165140
## 95 1.0299645493
## 96 1.0290396104
## 97 1.0218172713
## 98 1.0170996465
## 99 1.0130712450
## 100 1.0123143256
## 101 1.0085422782
## 102 1.0007363905
## 103 0.9948676639
## 104 0.9872774749
## 105 0.9699940296
## 106 0.9689887838
## 107 0.9684645959
## 108 0.9655787534
## 109 0.9620303653
## 110 0.9609360850
## 111 0.9474780724
## 112 0.9447093185
## 113 0.9313704676
## 114 0.9302535542
## 115 0.9290982968
## 116 0.9235942322
## 117 0.9218621977
## 118 0.9132229470
## 119 0.9095138807
## 120 0.9039238611
## 121 0.8838307354
## 122 0.8775586230
## 123 0.8722262131
## 124 0.8713049559
## 125 0.8493292136
## 126 0.8480294142
## 127 0.8445546816
## 128 0.8442593454
## 129 0.8433060149
## 130 0.8427534978
## 131 0.8119752306
## 132 0.8016860452
## 133 0.7985544587
## 134 0.7904002894
## 135 0.7794594467
## 136 0.7717982698
## 137 0.7642078026
## 138 0.7575923257
## 139 0.7525872696
## 140 0.7522649875
## 141 0.7404008133
## 142 0.7403506309
## 143 0.7381832538
## 144 0.7347400258
## 145 0.7296798770
## 146 0.7287546138
## 147 0.7113024435
## 148 0.7032858698
## 149 0.7018654378
## 150 0.6919361291
## 151 0.6909684965
## 152 0.6873582427
## 153 0.6862092979
## 154 0.6787096714
## 155 0.6692280242
## 156 0.6685257014
## 157 0.6659353676
## 158 0.6656206763
## 159 0.6608992339
## 160 0.6514676367
## 161 0.6488441303
## 162 0.6385035919
## 163 0.6349597734
## 164 0.6337996031
## 165 0.6286026962
## 166 0.6239794150
## 167 0.6165027275
## 168 0.6143437191
## 169 0.6133377441
## 170 0.6097602593
## 171 0.6092672354
## 172 0.6090815828
## 173 0.6077373811
## 174 0.6048398978
## 175 0.5992639707
## 176 0.5980988246
## 177 0.5867111123
## 178 0.5782734850
## 179 0.5778554064
## 180 0.5746270502
## 181 0.5722217855
## 182 0.5536605346
## 183 0.5521690647
## 184 0.5509569812
## 185 0.5488754927
## 186 0.5487129634
## 187 0.5433550303
## 188 0.5392747586
## 189 0.5283670401
## 190 0.5261502854
## 191 0.5217659817
## 192 0.5151508064
## 193 0.5099272940
## 194 0.5065158646
## 195 0.4980173314
## 196 0.4900139620
## 197 0.4890997825
## 198 0.4883699299
## 199 0.4843773801
## 200 0.4788371644
## 201 0.4750527281
## 202 0.4700896398
## 203 0.4671871608
## 204 0.4317826503
## 205 0.4312797421
## 206 0.4283578348
## 207 0.4273188607
## 208 0.4135385180
## 209 0.4018993215
## 210 0.3975903053
## 211 0.3797717064
## 212 0.3729525963
## 213 0.3710902583
## 214 0.3640087461
## 215 0.3556022726
## 216 0.3524818493
## 217 0.3501699040
## 218 0.3325412569
## 219 0.3318527935
## 220 0.3305614279
## 221 0.3297009961
## 222 0.3284299779
## 223 0.3203439386
## 224 0.3167713590
## 225 0.3144672436
## 226 0.3122394515
## 227 0.2970222392
## 228 0.2884674973
## 229 0.2872330307
## 230 0.2773832386
## 231 0.2637491318
## 232 0.2599863490
## 233 0.2553673930
## 234 0.2488489004
## 235 0.2447769165
## 236 0.2444564544
## 237 0.2440235248
## 238 0.2357193484
## 239 0.2284054895
## 240 0.2205027326
## 241 0.1606550546
## 242 0.1573527871
## 243 0.1536775234
## 244 0.1521143472
## 245 0.1466841474
## 246 0.1439510783
## 247 0.1170505020
## 248 0.1034320547
## 249 0.1006804546
## 250 0.0950575018
## 251 0.0914037424
## 252 0.0797616585
## 253 0.0482463127
## 254 0.0343745655
## 255 0.0117975782
## 256 0.0092130528
## 257 0.0008723147
## 258 0.0000000000
## 259 0.0000000000
## 260 0.0000000000
## 261 0.0000000000
## 262 0.0000000000
## 263 0.0000000000
## 264 0.0000000000
## 265 0.0000000000
## 266 0.0000000000
## 267 0.0000000000
## 268 0.0000000000
## 269 0.0000000000
## 270 0.0000000000
## 271 0.0000000000
## 272 0.0000000000
## 273 0.0000000000
## 274 0.0000000000
## 275 0.0000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM1_df$Feature<-rownames(importance_model_LRM1_df)
importance_model_LRM1_df <- importance_model_LRM1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM1_df)
}
if (!require(reshape2)) {
install.packages("reshape2")
library(reshape2)
} else {
library(reshape2)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM1_df,n=20)$Feature)
importance_melted_LRM1_df <- importance_model_LRM1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
table(df_LRM1$DX)
##
## CN MCI
## 221 333
prop.table(table(df_LRM1$DX))
##
## CN MCI
## 0.398917 0.601083
table(trainData$DX)
##
## CN MCI
## 155 234
prop.table(table(trainData$DX))
##
## CN MCI
## 0.3984576 0.6015424
barplot(table(df_LRM1$DX), main = "Whole Data Class Distribution")
For the training Data set:
barplot(table(trainData$DX), main = "Train Data Class Distribution")
Let’s calculate the imbalance ratio, which is the ratio of the number of samples in the majority class to the number of samples in the minority class. severe class imbalance will be indicated by high ratio.
class_counts <- table(df_LRM1$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the whole data set is:")
## [1] "The imbalance radio of the whole data set is:"
print(imbalance_ratio)
## [1] 1.506787
class_counts <- table(trainData$DX)
imbalance_ratio <- max(class_counts) / min(class_counts)
print("The imbalance radio of the training data set is:")
## [1] "The imbalance radio of the training data set is:"
print(imbalance_ratio)
## [1] 1.509677Let’s do Chi-square test which could determine if the class distribution significantly deviates from a balanced distribution. The p-value provided by the test will indicate the significance of class imbalance.
chisq.test(table(df_LRM1$DX))
##
## Chi-squared test for given probabilities
##
## data: table(df_LRM1$DX)
## X-squared = 22.643, df = 1, p-value = 1.951e-06
chisq.test(table(trainData$DX))
##
## Chi-squared test for given probabilities
##
## data: table(trainData$DX)
## X-squared = 16.044, df = 1, p-value = 6.19e-05library(smotefamily)
smote_data_LGR_1 <- SMOTE(X = trainData[, !names(trainData) %in% "DX"], target = trainData$DX, K = 5, dup_size = 1)
balanced_data_LGR_1 <- smote_data_LGR_1$data
colnames(balanced_data_LGR_1)[colnames(balanced_data_LGR_1) == "class"] <- "DX"
table(balanced_data_LGR_1$DX)
##
## CN MCI
## 310 234
dim(balanced_data_LGR_1)
## [1] 544 276
ctrl <- trainControl(method = "cv", number = 5)
model_LRM2 <- caret::train(DX ~ ., data = balanced_data_LGR_1, method = "glmnet", trControl = ctrl)
predictions <- predict(model_LRM2, newdata = testData)
caret::confusionMatrix(predictions, testData$DX)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 53 12
## MCI 13 87
##
## Accuracy : 0.8485
## 95% CI : (0.7845, 0.8995)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 3.459e-12
##
## Kappa : 0.6835
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.8030
## Specificity : 0.8788
## Pos Pred Value : 0.8154
## Neg Pred Value : 0.8700
## Prevalence : 0.4000
## Detection Rate : 0.3212
## Detection Prevalence : 0.3939
## Balanced Accuracy : 0.8409
##
## 'Positive' Class : CN
##
print(model_LRM2)
## glmnet
##
## 544 samples
## 275 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 435, 436, 435, 435, 435
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.10 0.0001832759 0.9337920 0.8638092
## 0.10 0.0018327587 0.9283045 0.8523472
## 0.10 0.0183275874 0.9172783 0.8290311
## 0.55 0.0001832759 0.9025314 0.7981273
## 0.55 0.0018327587 0.8988787 0.7900993
## 0.55 0.0183275874 0.8510873 0.6892913
## 1.00 0.0001832759 0.8896534 0.7719363
## 1.00 0.0018327587 0.8694699 0.7291238
## 1.00 0.0183275874 0.8142881 0.6116905
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.1 and lambda = 0.0001832759.
train_predictions <- predict(model_LRM2, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
mean_accuracy_model_LRM2 <- mean(model_LRM2$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_model_LRM2)
## [1] 0.889476
importance_model_LRM2 <- varImp(model_LRM2)
print(importance_model_LRM2)
## glmnet variable importance
##
## only 20 most important variables shown (out of 275)
##
## Overall
## PC2 100.00
## cg27272246 49.58
## cg00004073 49.57
## cg14710850 46.29
## cg14582632 44.66
## cg23432430 44.60
## cg07480955 44.06
## cg03924089 43.24
## cg02981548 43.16
## cg13405878 42.60
## cg02225060 41.33
## cg17129965 41.12
## cg21243064 40.78
## cg08861434 40.24
## cg14687298 39.77
## cg06833284 39.68
## cg20685672 39.65
## cg08788093 39.59
## cg11169344 39.24
## cg19471911 38.82
plot(importance_model_LRM2, top = 20, main = "Variable Importance Plot")
importance_model_LRM2_df<-importance_model_LRM2$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5|| METHOD_FEATURE_FLAG==6){
importance_final_model_LRM2 <- varImp(model_LRM2$finalModel)
library(dplyr)
ordered_importance_final_model_LRM2 <- importance_final_model_LRM2 %>% arrange(desc(Overall))
print(ordered_importance_final_model_LRM2)
}
## Overall
## 1 8.158983896
## 2 4.045586547
## 3 4.044332025
## 4 3.776945003
## 5 3.644208194
## 6 3.638574737
## 7 3.594778141
## 8 3.527966862
## 9 3.521020642
## 10 3.475777670
## 11 3.371805226
## 12 3.355021477
## 13 3.327352823
## 14 3.282828492
## 15 3.245145708
## 16 3.237443348
## 17 3.235327667
## 18 3.230312265
## 19 3.201602178
## 20 3.167702097
## 21 3.113627813
## 22 3.105398041
## 23 3.097216248
## 24 3.061831597
## 25 2.989610883
## 26 2.987732885
## 27 2.958681323
## 28 2.843773830
## 29 2.840071839
## 30 2.818612339
## 31 2.801270405
## 32 2.734189928
## 33 2.712987532
## 34 2.694319164
## 35 2.646916648
## 36 2.623192874
## 37 2.612737402
## 38 2.519280126
## 39 2.516198227
## 40 2.498988406
## 41 2.416324577
## 42 2.401965525
## 43 2.395875732
## 44 2.381195605
## 45 2.298150494
## 46 2.288110924
## 47 2.243127091
## 48 2.215023508
## 49 2.188148473
## 50 2.158588595
## 51 2.128687791
## 52 2.125329303
## 53 2.096649768
## 54 2.091749841
## 55 2.086535416
## 56 2.067893469
## 57 2.032485313
## 58 2.013827244
## 59 2.009720220
## 60 1.993059266
## 61 1.983296818
## 62 1.974484666
## 63 1.963723034
## 64 1.956235808
## 65 1.949326367
## 66 1.929601856
## 67 1.927738569
## 68 1.917256581
## 69 1.900325586
## 70 1.893501586
## 71 1.888612157
## 72 1.883581036
## 73 1.859579705
## 74 1.829191180
## 75 1.821595273
## 76 1.817155292
## 77 1.782142315
## 78 1.781743502
## 79 1.779391368
## 80 1.778366776
## 81 1.775012298
## 82 1.773128640
## 83 1.744582236
## 84 1.730841250
## 85 1.723541005
## 86 1.699823598
## 87 1.691145491
## 88 1.681856018
## 89 1.680923771
## 90 1.662712082
## 91 1.640800810
## 92 1.632693784
## 93 1.621519502
## 94 1.610369726
## 95 1.609984495
## 96 1.606817464
## 97 1.606662347
## 98 1.592399651
## 99 1.587045457
## 100 1.574528522
## 101 1.561517818
## 102 1.543874446
## 103 1.542735254
## 104 1.531498589
## 105 1.492621302
## 106 1.473412086
## 107 1.469816373
## 108 1.469805994
## 109 1.463745687
## 110 1.452121259
## 111 1.433896141
## 112 1.429725409
## 113 1.425994382
## 114 1.416687513
## 115 1.404985939
## 116 1.394436938
## 117 1.387735851
## 118 1.381106040
## 119 1.366232164
## 120 1.365874077
## 121 1.361275278
## 122 1.330168456
## 123 1.329616842
## 124 1.308786578
## 125 1.305622106
## 126 1.296405957
## 127 1.280442757
## 128 1.278685509
## 129 1.278524868
## 130 1.276740231
## 131 1.264162793
## 132 1.240120248
## 133 1.207081345
## 134 1.202722256
## 135 1.195121482
## 136 1.189978944
## 137 1.173792465
## 138 1.171939057
## 139 1.167333964
## 140 1.163605765
## 141 1.158968388
## 142 1.139035613
## 143 1.135288085
## 144 1.134599094
## 145 1.127870810
## 146 1.120719165
## 147 1.112663111
## 148 1.096949966
## 149 1.091338980
## 150 1.082762751
## 151 1.081671700
## 152 1.080795507
## 153 1.075697003
## 154 1.034493634
## 155 1.032368750
## 156 1.029460827
## 157 1.016530699
## 158 1.015466802
## 159 1.011235319
## 160 1.008974557
## 161 0.989412730
## 162 0.980048224
## 163 0.973945052
## 164 0.965705910
## 165 0.957289859
## 166 0.945980676
## 167 0.943688930
## 168 0.936479259
## 169 0.931060308
## 170 0.930246250
## 171 0.927029497
## 172 0.926368403
## 173 0.920821082
## 174 0.908212136
## 175 0.901445598
## 176 0.899825207
## 177 0.896224280
## 178 0.895357908
## 179 0.879711463
## 180 0.865068607
## 181 0.862250307
## 182 0.831037380
## 183 0.826426732
## 184 0.825941497
## 185 0.788135571
## 186 0.787778413
## 187 0.782644343
## 188 0.768102850
## 189 0.751065048
## 190 0.747998161
## 191 0.747785875
## 192 0.738412779
## 193 0.738156319
## 194 0.731309934
## 195 0.725651945
## 196 0.723608522
## 197 0.718855209
## 198 0.707807522
## 199 0.705188913
## 200 0.692427128
## 201 0.685631313
## 202 0.684322259
## 203 0.671465034
## 204 0.646774426
## 205 0.645207765
## 206 0.642034887
## 207 0.623951959
## 208 0.603513565
## 209 0.591605848
## 210 0.585520299
## 211 0.585170722
## 212 0.579908440
## 213 0.576813000
## 214 0.565826996
## 215 0.556455235
## 216 0.555074293
## 217 0.551782791
## 218 0.533869407
## 219 0.524467632
## 220 0.517229838
## 221 0.510725331
## 222 0.506600568
## 223 0.500962946
## 224 0.483006713
## 225 0.469974244
## 226 0.465514843
## 227 0.455277888
## 228 0.448211847
## 229 0.443462614
## 230 0.442826412
## 231 0.436767986
## 232 0.413758956
## 233 0.408331015
## 234 0.399450842
## 235 0.396256173
## 236 0.393637701
## 237 0.380305328
## 238 0.364770343
## 239 0.353557299
## 240 0.348004496
## 241 0.347945248
## 242 0.328235644
## 243 0.309588648
## 244 0.225464442
## 245 0.221236626
## 246 0.194205798
## 247 0.189074336
## 248 0.147045757
## 249 0.143964289
## 250 0.119761030
## 251 0.115595304
## 252 0.099196397
## 253 0.076863128
## 254 0.071005444
## 255 0.065163197
## 256 0.060945357
## 257 0.057488906
## 258 0.047003179
## 259 0.041303028
## 260 0.025552264
## 261 0.015184285
## 262 0.001930173
## 263 0.000000000
## 264 0.000000000
## 265 0.000000000
## 266 0.000000000
## 267 0.000000000
## 268 0.000000000
## 269 0.000000000
## 270 0.000000000
## 271 0.000000000
## 272 0.000000000
## 273 0.000000000
## 274 0.000000000
## 275 0.000000000
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_model_LRM2_df$Feature<-rownames(importance_model_LRM2_df)
importance_model_LRM2_df <- importance_model_LRM2_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_model_LRM2_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_model_LRM2_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_model_LRM2_df,n=20)$Feature)
importance_melted_LRM2_df <- importance_model_LRM2_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_LRM2_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = testData$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(testData$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (testData$DX MCI) > 66 cases (testData$DX CN).
## Area under the curve: 0.9008
## [1] "The auc value is:"
## Area under the curve: 0.9008
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "Dementia"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curve <- roc(testData$DX, prob_predictions[, "CI"], levels = rev(levels(testData$DX)))
auc_value <- roc_curve$auc
print(roc_curve)
print("The auc value is:")
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(model_LRM2, newdata = testData, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
}
df_ENM1<-processed_data
featureName_ENM1<-AfterProcess_FeatureName
library(caret)
set.seed(123)
trainIndex <- createDataPartition(df_ENM1$DX, p = 0.7, list = FALSE)
trainData_ENM1 <- df_ENM1[trainIndex, ]
testData_ENM1 <- df_ENM1[-trainIndex, ]
ctrl <- trainControl(method = "cv", number = 5)
param_grid <- expand.grid(alpha = 0:1, lambda = seq(0.001, 1, length = 20))
elastic_net_model1 <- caret::train(DX ~ ., data = trainData_ENM1, method = "glmnet",
trControl = ctrl, tuneGrid = param_grid)
print(elastic_net_model1)
## glmnet
##
## 389 samples
## 275 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0 0.00100000 0.8790876 0.74310340
## 0 0.05357895 0.8738928 0.73083594
## 0 0.10615789 0.8815851 0.74600471
## 0 0.15873684 0.8815851 0.74557439
## 0 0.21131579 0.8867466 0.75617481
## 0 0.26389474 0.8816184 0.74461249
## 0 0.31647368 0.8841825 0.74862632
## 0 0.36905263 0.8841825 0.74862632
## 0 0.42163158 0.8841825 0.74862632
## 0 0.47421053 0.8841492 0.74763663
## 0 0.52678947 0.8841492 0.74763663
## 0 0.57936842 0.8790210 0.73585893
## 0 0.63194737 0.8738595 0.72365219
## 0 0.68452632 0.8738595 0.72365219
## 0 0.73710526 0.8661672 0.70616338
## 0 0.78968421 0.8636031 0.70002206
## 0 0.84226316 0.8532801 0.67517673
## 0 0.89484211 0.8481185 0.66264425
## 0 0.94742105 0.8404262 0.64476563
## 0 1.00000000 0.8327339 0.62654695
## 1 0.00100000 0.7685981 0.50993845
## 1 0.05357895 0.6117549 0.09849449
## 1 0.10615789 0.6015318 0.00000000
## 1 0.15873684 0.6015318 0.00000000
## 1 0.21131579 0.6015318 0.00000000
## 1 0.26389474 0.6015318 0.00000000
## 1 0.31647368 0.6015318 0.00000000
## 1 0.36905263 0.6015318 0.00000000
## 1 0.42163158 0.6015318 0.00000000
## 1 0.47421053 0.6015318 0.00000000
## 1 0.52678947 0.6015318 0.00000000
## 1 0.57936842 0.6015318 0.00000000
## 1 0.63194737 0.6015318 0.00000000
## 1 0.68452632 0.6015318 0.00000000
## 1 0.73710526 0.6015318 0.00000000
## 1 0.78968421 0.6015318 0.00000000
## 1 0.84226316 0.6015318 0.00000000
## 1 0.89484211 0.6015318 0.00000000
## 1 0.94742105 0.6015318 0.00000000
## 1 1.00000000 0.6015318 0.00000000
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0 and lambda = 0.2113158.
mean_accuracy_elastic_net_model1 <- mean(elastic_net_model1$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_elastic_net_model1)
## [1] 0.7411089
FeatEval_Freq_mean_accuracy_cv_ENM1<-mean_accuracy_elastic_net_model1
print(FeatEval_Freq_mean_accuracy_cv_ENM1)
## [1] 0.7411089
train_predictions <- predict(elastic_net_model1, newdata = trainData, type = "raw")
train_accuracy <- mean(train_predictions == trainData_ENM1$DX)
FeatEval_Freq_ENM1_trainAccuracy<-train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.989717223650386"
print(FeatEval_Freq_ENM1_trainAccuracy)
## [1] 0.9897172
predictions <- predict(elastic_net_model1, newdata = testData_ENM1)
cm_FeatEval_Freq_ENM1 <- caret::confusionMatrix(predictions,testData_ENM1$DX)
print(cm_FeatEval_Freq_ENM1)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 50 9
## MCI 16 90
##
## Accuracy : 0.8485
## 95% CI : (0.7845, 0.8995)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 3.459e-12
##
## Kappa : 0.6787
##
## Mcnemar's Test P-Value : 0.2301
##
## Sensitivity : 0.7576
## Specificity : 0.9091
## Pos Pred Value : 0.8475
## Neg Pred Value : 0.8491
## Prevalence : 0.4000
## Detection Rate : 0.3030
## Detection Prevalence : 0.3576
## Balanced Accuracy : 0.8333
##
## 'Positive' Class : CN
##
cm_FeatEval_Freq_ENM1_Accuracy<-cm_FeatEval_Freq_ENM1$overall["Accuracy"]
cm_FeatEval_Freq_ENM1_Kappa<-cm_FeatEval_Freq_ENM1$overall["Kappa"]
print(cm_FeatEval_Freq_ENM1_Accuracy)
## Accuracy
## 0.8484848
print(cm_FeatEval_Freq_ENM1_Kappa)
## Kappa
## 0.6786632
importance_elastic_net_model1<- varImp(elastic_net_model1)
print(importance_elastic_net_model1)
## glmnet variable importance
##
## only 20 most important variables shown (out of 275)
##
## Overall
## PC2 100.00
## cg20685672 56.91
## cg23432430 56.71
## cg27272246 54.34
## cg16652920 53.09
## cg03924089 52.85
## cg02981548 52.53
## cg09015880 51.31
## cg00962106 50.93
## cg00086247 50.38
## cg13405878 49.52
## cg02225060 49.09
## cg14710850 48.74
## cg17129965 46.32
## cg12543766 46.25
## cg14687298 46.22
## cg07028768 45.80
## cg06833284 45.62
## cg06634367 45.40
## cg17042243 45.25
plot(importance_elastic_net_model1, top = 20, main = "Variable Importance Plot")
importance_elastic_net_model1_df<-importance_elastic_net_model1$importance
if(METHOD_FEATURE_FLAG==3 || METHOD_FEATURE_FLAG==4 || METHOD_FEATURE_FLAG==5 || METHOD_FEATURE_FLAG==6 ){
importance_elastic_net_final_model1 <- varImp(elastic_net_model1$finalModel)
library(dplyr)
Ordered_importance_elastic_net_final_model1 <- importance_elastic_net_final_model1 %>% arrange(desc(Overall))
print(Ordered_importance_elastic_net_final_model1)
}
## Overall
## 1 1.004156988
## 2 0.571557346
## 3 0.569497490
## 4 0.545737559
## 5 0.533196232
## 6 0.530756356
## 7 0.527582807
## 8 0.515279239
## 9 0.511533246
## 10 0.506011566
## 11 0.497365714
## 12 0.493008729
## 13 0.489489407
## 14 0.465221302
## 15 0.464532708
## 16 0.464251661
## 17 0.459960646
## 18 0.458174036
## 19 0.456007669
## 20 0.454436033
## 21 0.438041956
## 22 0.437174312
## 23 0.422831576
## 24 0.418050282
## 25 0.417004206
## 26 0.408278508
## 27 0.407772690
## 28 0.404446207
## 29 0.400088997
## 30 0.392892292
## 31 0.390309565
## 32 0.388391546
## 33 0.387731869
## 34 0.382679698
## 35 0.375396276
## 36 0.370420754
## 37 0.368142133
## 38 0.363022145
## 39 0.360357224
## 40 0.359158289
## 41 0.358247072
## 42 0.354302146
## 43 0.350992429
## 44 0.350238417
## 45 0.344813189
## 46 0.344544800
## 47 0.342347968
## 48 0.341116742
## 49 0.338106270
## 50 0.337933769
## 51 0.334678513
## 52 0.331558430
## 53 0.330277025
## 54 0.329858149
## 55 0.327754095
## 56 0.326770797
## 57 0.322321330
## 58 0.320062935
## 59 0.314973305
## 60 0.311716649
## 61 0.310739087
## 62 0.309447376
## 63 0.308351924
## 64 0.305343430
## 65 0.303867556
## 66 0.301965339
## 67 0.294211452
## 68 0.293463924
## 69 0.292098243
## 70 0.291467356
## 71 0.286497648
## 72 0.284136273
## 73 0.282757041
## 74 0.282024922
## 75 0.280478125
## 76 0.278444136
## 77 0.277193220
## 78 0.277006691
## 79 0.276977295
## 80 0.276894499
## 81 0.276591590
## 82 0.273833969
## 83 0.271699194
## 84 0.270253838
## 85 0.270029745
## 86 0.269584963
## 87 0.269277172
## 88 0.269207310
## 89 0.264079820
## 90 0.264049259
## 91 0.260144065
## 92 0.259909531
## 93 0.259126194
## 94 0.259025526
## 95 0.256869511
## 96 0.256451147
## 97 0.256422570
## 98 0.255581371
## 99 0.255501350
## 100 0.254662983
## 101 0.253656515
## 102 0.252016924
## 103 0.251981964
## 104 0.251108417
## 105 0.251073491
## 106 0.249696148
## 107 0.249433636
## 108 0.249082514
## 109 0.248605258
## 110 0.247932836
## 111 0.247324740
## 112 0.247275983
## 113 0.246872475
## 114 0.244458072
## 115 0.243851850
## 116 0.243407497
## 117 0.242011489
## 118 0.241164170
## 119 0.239512367
## 120 0.239480783
## 121 0.236259059
## 122 0.235649552
## 123 0.232056241
## 124 0.232047911
## 125 0.230805333
## 126 0.230160860
## 127 0.229915258
## 128 0.228396583
## 129 0.227269514
## 130 0.226481510
## 131 0.225914090
## 132 0.225542809
## 133 0.225388658
## 134 0.224487589
## 135 0.223252548
## 136 0.222319054
## 137 0.222294184
## 138 0.221178590
## 139 0.220058473
## 140 0.217309939
## 141 0.216917908
## 142 0.216348537
## 143 0.214993881
## 144 0.214779512
## 145 0.214607821
## 146 0.214238844
## 147 0.213574925
## 148 0.213545016
## 149 0.212159334
## 150 0.208876915
## 151 0.208778539
## 152 0.208667245
## 153 0.206623698
## 154 0.205236613
## 155 0.204496829
## 156 0.204214302
## 157 0.204043010
## 158 0.202465728
## 159 0.201876767
## 160 0.200758868
## 161 0.199469272
## 162 0.199344110
## 163 0.198681086
## 164 0.197974065
## 165 0.197284690
## 166 0.196668901
## 167 0.196667584
## 168 0.195891330
## 169 0.195617330
## 170 0.191038195
## 171 0.191011735
## 172 0.190734226
## 173 0.188220807
## 174 0.188040098
## 175 0.187917038
## 176 0.181961407
## 177 0.181081801
## 178 0.180941923
## 179 0.179706638
## 180 0.177461132
## 181 0.175485627
## 182 0.173497609
## 183 0.173178333
## 184 0.171290791
## 185 0.171028277
## 186 0.170474607
## 187 0.170344351
## 188 0.170075562
## 189 0.169997195
## 190 0.167726069
## 191 0.167609816
## 192 0.167410357
## 193 0.167123570
## 194 0.166897531
## 195 0.166219144
## 196 0.164479771
## 197 0.163943469
## 198 0.163360149
## 199 0.162468670
## 200 0.162163605
## 201 0.161030090
## 202 0.160965620
## 203 0.159875123
## 204 0.159399951
## 205 0.155794082
## 206 0.153931670
## 207 0.152096203
## 208 0.150634641
## 209 0.150454204
## 210 0.149567319
## 211 0.148148260
## 212 0.148059586
## 213 0.147379877
## 214 0.147180716
## 215 0.146608501
## 216 0.145427746
## 217 0.145306560
## 218 0.145178143
## 219 0.145019077
## 220 0.141830508
## 221 0.141429806
## 222 0.141288646
## 223 0.140385027
## 224 0.139960701
## 225 0.139865679
## 226 0.139482407
## 227 0.136971738
## 228 0.136582740
## 229 0.135069863
## 230 0.132414512
## 231 0.131934284
## 232 0.130636001
## 233 0.127186778
## 234 0.127027759
## 235 0.126551534
## 236 0.125736815
## 237 0.125009199
## 238 0.124761481
## 239 0.123200693
## 240 0.120806813
## 241 0.120564859
## 242 0.119499246
## 243 0.118363652
## 244 0.116915679
## 245 0.113212774
## 246 0.112571401
## 247 0.110312951
## 248 0.109565446
## 249 0.107870967
## 250 0.104596579
## 251 0.103972025
## 252 0.103562326
## 253 0.100998521
## 254 0.100009495
## 255 0.095642796
## 256 0.095298821
## 257 0.090313937
## 258 0.089858887
## 259 0.087124298
## 260 0.084852518
## 261 0.084830520
## 262 0.081900160
## 263 0.078778746
## 264 0.074226904
## 265 0.073987616
## 266 0.072648931
## 267 0.064983072
## 268 0.061651361
## 269 0.060449869
## 270 0.055142568
## 271 0.038304226
## 272 0.026092917
## 273 0.010009739
## 274 0.009827456
## 275 0.000178133
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_elastic_net_model1_df$Feature<-rownames(importance_elastic_net_model1_df)
importance_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_elastic_net_model1_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_elastic_net_model1_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_elastic_net_model1_df,n=20)$Feature)
importance_melted_elastic_net_model1_df <- importance_elastic_net_model1_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_elastic_net_model1_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.9236
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curve <- roc(testData_ENM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_ENM1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_ENM1_AUC<-auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG ==1){
prob_predictions <- predict(elastic_net_model1, newdata = testData_ENM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_ENM1_AUC<-mean_auc
}
print(FeatEval_Freq_ENM1_AUC)
## Area under the curve: 0.9236
library(caret)
library(xgboost)
library(dplyr)
library(doParallel)
numCores <- detectCores() - 1
c2 <- makeCluster(numCores)
registerDoParallel(c2)
df_XGB1<-processed_data
featureName_XGB1<-AfterProcess_FeatureName
set.seed(123)
trainIndex <- createDataPartition(df_XGB1$DX, p = 0.7, list = FALSE)
trainData_XGB1<- df_XGB1[trainIndex, ]
testData_XGB1 <- df_XGB1[-trainIndex, ]
cv_control <- trainControl(method = "cv", number = 5, allowParallel = TRUE)
xgb_model <- caret::train(
DX ~ ., data = trainData_XGB1,
method = "xgbTree", trControl = cv_control,
metric = "Accuracy"
)
print(xgb_model)
## eXtreme Gradient Boosting
##
## 389 samples
## 275 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds Accuracy Kappa
## 0.3 1 0.6 0.50 50 0.6195804 0.1831054
## 0.3 1 0.6 0.50 100 0.6554113 0.2719526
## 0.3 1 0.6 0.50 150 0.6606727 0.2805742
## 0.3 1 0.6 0.75 50 0.5989344 0.1193267
## 0.3 1 0.6 0.75 100 0.6014652 0.1342076
## 0.3 1 0.6 0.75 150 0.6657010 0.2835821
## 0.3 1 0.6 1.00 50 0.6092241 0.1456780
## 0.3 1 0.6 1.00 100 0.6477855 0.2410756
## 0.3 1 0.6 1.00 150 0.6632035 0.2753545
## 0.3 1 0.8 0.50 50 0.6246087 0.1931252
## 0.3 1 0.8 0.50 100 0.7043290 0.3633589
## 0.3 1 0.8 0.50 150 0.7070263 0.3747451
## 0.3 1 0.8 0.75 50 0.6247086 0.1841832
## 0.3 1 0.8 0.75 100 0.6477522 0.2424623
## 0.3 1 0.8 0.75 150 0.6940393 0.3472617
## 0.3 1 0.8 1.00 50 0.6013653 0.1360248
## 0.3 1 0.8 1.00 100 0.6528805 0.2498411
## 0.3 1 0.8 1.00 150 0.6759907 0.3063892
## 0.3 2 0.6 0.50 50 0.6345987 0.2175954
## 0.3 2 0.6 0.50 100 0.6734266 0.2998494
## 0.3 2 0.6 0.50 150 0.6888445 0.3346837
## 0.3 2 0.6 0.75 50 0.6553447 0.2626974
## 0.3 2 0.6 0.75 100 0.6681319 0.2928050
## 0.3 2 0.6 0.75 150 0.6811189 0.3143791
## 0.3 2 0.6 1.00 50 0.6451215 0.2247644
## 0.3 2 0.6 1.00 100 0.6708625 0.2819403
## 0.3 2 0.6 1.00 150 0.6657676 0.2752443
## 0.3 2 0.8 0.50 50 0.6324342 0.2273855
## 0.3 2 0.8 0.50 100 0.6606394 0.2735651
## 0.3 2 0.8 0.50 150 0.6658009 0.2880361
## 0.3 2 0.8 0.75 50 0.6529471 0.2559094
## 0.3 2 0.8 0.75 100 0.6992674 0.3505816
## 0.3 2 0.8 0.75 150 0.6915085 0.3299005
## 0.3 2 0.8 1.00 50 0.6426573 0.2196727
## 0.3 2 0.8 1.00 100 0.6631702 0.2606419
## 0.3 2 0.8 1.00 150 0.6426240 0.2230989
## 0.3 3 0.6 0.50 50 0.6580420 0.2563456
## 0.3 3 0.6 0.50 100 0.6734599 0.2834143
## 0.3 3 0.6 0.50 150 0.6887446 0.3191616
## 0.3 3 0.6 0.75 50 0.7017649 0.3510287
## 0.3 3 0.6 0.75 100 0.7043956 0.3604205
## 0.3 3 0.6 0.75 150 0.6992008 0.3522486
## 0.3 3 0.6 1.00 50 0.6477522 0.2261938
## 0.3 3 0.6 1.00 100 0.6812188 0.2954398
## 0.3 3 0.6 1.00 150 0.6966367 0.3294476
## 0.3 3 0.8 0.50 50 0.6554446 0.2467848
## 0.3 3 0.8 0.50 100 0.6888112 0.3153816
## 0.3 3 0.8 0.50 150 0.6913420 0.3268534
## 0.3 3 0.8 0.75 50 0.6270396 0.1924254
## 0.3 3 0.8 0.75 100 0.6450882 0.2312230
## 0.3 3 0.8 0.75 150 0.6657010 0.2752946
## 0.3 3 0.8 1.00 50 0.6450882 0.2183279
## 0.3 3 0.8 1.00 100 0.6503497 0.2312043
## 0.3 3 0.8 1.00 150 0.6631702 0.2571683
## 0.4 1 0.6 0.50 50 0.6038961 0.1538924
## 0.4 1 0.6 0.50 100 0.6551782 0.2662582
## 0.4 1 0.6 0.50 150 0.6835498 0.3259852
## 0.4 1 0.6 0.75 50 0.6115884 0.1649314
## 0.4 1 0.6 0.75 100 0.6296037 0.2010936
## 0.4 1 0.6 0.75 150 0.6707959 0.2906577
## 0.4 1 0.6 1.00 50 0.6117882 0.1557754
## 0.4 1 0.6 1.00 100 0.6297702 0.2042566
## 0.4 1 0.6 1.00 150 0.6477855 0.2460278
## 0.4 1 0.8 0.50 50 0.6581752 0.2752748
## 0.4 1 0.8 0.50 100 0.6425574 0.2455820
## 0.4 1 0.8 0.50 150 0.6914752 0.3497238
## 0.4 1 0.8 0.75 50 0.6170829 0.1925769
## 0.4 1 0.8 0.75 100 0.6427239 0.2417809
## 0.4 1 0.8 0.75 150 0.6786547 0.3159856
## 0.4 1 0.8 1.00 50 0.6013986 0.1378337
## 0.4 1 0.8 1.00 100 0.6298035 0.2125193
## 0.4 1 0.8 1.00 150 0.6425574 0.2409853
## 0.4 2 0.6 0.50 50 0.6657010 0.2836986
## 0.4 2 0.6 0.50 100 0.6914419 0.3435955
## 0.4 2 0.6 0.50 150 0.7095238 0.3785075
## 0.4 2 0.6 0.75 50 0.6528472 0.2655094
## 0.4 2 0.6 0.75 100 0.6940726 0.3520946
## 0.4 2 0.6 0.75 150 0.6863470 0.3365679
## 0.4 2 0.6 1.00 50 0.6195471 0.1924584
## 0.4 2 0.6 1.00 100 0.6528805 0.2544478
## 0.4 2 0.6 1.00 150 0.6503497 0.2504922
## 0.4 2 0.8 0.50 50 0.6348318 0.2177916
## 0.4 2 0.8 0.50 100 0.6786547 0.3135940
## 0.4 2 0.8 0.50 150 0.6888778 0.3343505
## 0.4 2 0.8 0.75 50 0.6194139 0.1838065
## 0.4 2 0.8 0.75 100 0.6555112 0.2570094
## 0.4 2 0.8 0.75 150 0.6734932 0.2944701
## 0.4 2 0.8 1.00 50 0.6270396 0.1863258
## 0.4 2 0.8 1.00 100 0.6553114 0.2473063
## 0.4 2 0.8 1.00 150 0.6732934 0.2889508
## 0.4 3 0.6 0.50 50 0.6451881 0.2311668
## 0.4 3 0.6 0.50 100 0.6478188 0.2433042
## 0.4 3 0.6 0.50 150 0.6555112 0.2588965
## 0.4 3 0.6 0.75 50 0.7094239 0.3635522
## 0.4 3 0.6 0.75 100 0.7119880 0.3721184
## 0.4 3 0.6 0.75 150 0.7196470 0.3874689
## 0.4 3 0.6 1.00 50 0.6555778 0.2408767
## 0.4 3 0.6 1.00 100 0.6684649 0.2743292
## 0.4 3 0.6 1.00 150 0.6786547 0.2964161
## 0.4 3 0.8 0.50 50 0.6452214 0.2309026
## 0.4 3 0.8 0.50 100 0.6555445 0.2505798
## 0.4 3 0.8 0.50 150 0.6760906 0.2964534
## 0.4 3 0.8 0.75 50 0.6555445 0.2444101
## 0.4 3 0.8 0.75 100 0.6658675 0.2753228
## 0.4 3 0.8 0.75 150 0.6735931 0.2944940
## 0.4 3 0.8 1.00 50 0.6270396 0.1943746
## 0.4 3 0.8 1.00 100 0.6450882 0.2312066
## 0.4 3 0.8 1.00 150 0.6502831 0.2474929
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning parameter 'min_child_weight' was held constant at a value of 1
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were nrounds = 150, max_depth = 3, eta = 0.4, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample = 0.75.
mean_accuracy_xgb_model<- mean(xgb_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_xgb_model)
## [1] 0.6586691
FeatEval_Freq_mean_accuracy_cv_xgb<-mean_accuracy_xgb_model
print(FeatEval_Freq_mean_accuracy_cv_xgb)
## [1] 0.6586691
train_predictions <- predict(xgb_model, newdata = trainData_XGB1, type = "raw")
train_accuracy <- mean(train_predictions == trainData_XGB1$DX)
FeatEval_Freq_xgb_trainAccuracy <- train_accuracy
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
print(FeatEval_Freq_xgb_trainAccuracy)
## [1] 1
predictions <- predict(xgb_model, newdata = testData_XGB1)
cm_FeatEval_Freq_xgb <-caret::confusionMatrix(predictions,testData_XGB1$DX)
print(cm_FeatEval_Freq_xgb)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 39 18
## MCI 27 81
##
## Accuracy : 0.7273
## 95% CI : (0.6526, 0.7936)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.0004353
##
## Kappa : 0.4186
##
## Mcnemar's Test P-Value : 0.2330380
##
## Sensitivity : 0.5909
## Specificity : 0.8182
## Pos Pred Value : 0.6842
## Neg Pred Value : 0.7500
## Prevalence : 0.4000
## Detection Rate : 0.2364
## Detection Prevalence : 0.3455
## Balanced Accuracy : 0.7045
##
## 'Positive' Class : CN
##
cm_FeatEval_Freq_xgb_Accuracy <-cm_FeatEval_Freq_xgb$overall["Accuracy"]
cm_FeatEval_Freq_xgb_Kappa <-cm_FeatEval_Freq_xgb$overall["Kappa"]
print(cm_FeatEval_Freq_xgb_Accuracy)
## Accuracy
## 0.7272727
print(cm_FeatEval_Freq_xgb_Kappa)
## Kappa
## 0.4186047
importance_xgb_model<- varImp(xgb_model)
print(importance_xgb_model)
## xgbTree variable importance
##
## only 20 most important variables shown (out of 275)
##
## Overall
## age.now 100.00
## cg00962106 70.99
## cg12543766 68.87
## cg23836570 63.34
## cg21812850 60.19
## cg06880438 58.54
## cg03924089 57.12
## cg16655091 55.65
## cg07480955 55.43
## cg00004073 55.36
## cg06833284 53.84
## cg16202259 53.22
## cg11438323 53.15
## cg02621446 51.20
## cg04412904 48.63
## cg06483046 48.02
## cg03549208 47.57
## cg16771215 43.17
## cg14687298 43.10
## cg25598710 41.42
plot(importance_xgb_model, top = 20, main = "Variable Importance Plot")
importance_xgb_model_df<-importance_xgb_model$importance
importance <- xgb.importance(model = xgb_model$finalModel)
xgb.plot.importance(importance_matrix = importance)
ordered_importance <- importance[order(-importance$Importance), ]
print(ordered_importance)
## Feature Gain Cover Frequency Importance
## <char> <num> <num> <num> <num>
## 1: age.now 3.233778e-02 0.0364283865 0.023706897 3.233778e-02
## 2: cg00962106 2.295531e-02 0.0082334618 0.006465517 2.295531e-02
## 3: cg12543766 2.227097e-02 0.0326303448 0.017241379 2.227097e-02
## 4: cg23836570 2.048147e-02 0.0204166244 0.008620690 2.048147e-02
## 5: cg21812850 1.946391e-02 0.0074455575 0.008620690 1.946391e-02
## ---
## 199: cg02887598 8.465138e-05 0.0005949789 0.002155172 8.465138e-05
## 200: cg17061760 7.163670e-05 0.0004949002 0.002155172 7.163670e-05
## 201: cg17653352 5.942399e-05 0.0004235992 0.002155172 5.942399e-05
## 202: cg22071943 4.206911e-05 0.0006235099 0.002155172 4.206911e-05
## 203: cg24470466 8.473424e-06 0.0004980855 0.002155172 8.473424e-06
stopCluster(c2)
registerDoSEQ()
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7698
if(METHOD_FEATURE_FLAG == 4 || METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curve <- roc(testData_XGB1$DX,
prob_predictions[, "CI"],
levels = rev(levels(testData_XGB1$DX)))
auc_value <- roc_curve$auc
FeatEval_Freq_xgb_AUC <- auc_value
print(auc_value)
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(xgb_model, newdata = testData_XGB1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_xgb_AUC <- mean_auc
}
print(FeatEval_Freq_xgb_AUC)
## Area under the curve: 0.7698
library(caret)
library(randomForest)
df_RFM1<-processed_data
featureName_RFM1<-AfterProcess_FeatureName
library(randomForest)
set.seed(123)
trainIndex <- createDataPartition(df_RFM1$DX, p = 0.7, list = FALSE)
train_data_RFM1 <- df_RFM1[trainIndex, ]
test_data_RFM1 <- df_RFM1[-trainIndex, ]
X_train_RFM1 <- subset(train_data_RFM1, select = -DX)
y_train_RFM1 <- train_data_RFM1$DX
X_train_RFM1 <- subset(test_data_RFM1, select = -DX)
y_test_RFM1 <- test_data_RFM1$DX
ctrl <- trainControl(method = "cv", number = 5, classProbs = TRUE)
rf_model <- caret::train(
DX ~ ., data = train_data_RFM1,
method = "rf", trControl = ctrl,
metric = "Accuracy",
importance = TRUE
)
print(rf_model)
## Random Forest
##
## 389 samples
## 275 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 311, 312, 311, 311, 311
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6169497 0.04847422
## 138 0.6452880 0.14944089
## 275 0.6452214 0.14677487
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 138.
mean_accuracy_rf_model<- mean(rf_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_rf_model)
## [1] 0.6358197
FeatEval_Freq_mean_accuracy_cv_rf<-mean_accuracy_rf_model
print(FeatEval_Freq_mean_accuracy_cv_rf)
## [1] 0.6358197
train_predictions <- predict(rf_model, newdata = train_data_RFM1, type = "raw")
train_accuracy <- mean(train_predictions == train_data_RFM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 1"
FeatEval_Freq_rf_trainAccuracy<-train_accuracy
print(FeatEval_Freq_rf_trainAccuracy)
## [1] 1
predictions <- predict(rf_model, newdata = test_data_RFM1)
cm_FeatEval_Freq_rf<-caret::confusionMatrix(predictions,test_data_RFM1$DX)
print(cm_FeatEval_Freq_rf)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 13 5
## MCI 53 94
##
## Accuracy : 0.6485
## 95% CI : (0.5704, 0.7211)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 0.1162
##
## Kappa : 0.1667
##
## Mcnemar's Test P-Value : 6.769e-10
##
## Sensitivity : 0.19697
## Specificity : 0.94949
## Pos Pred Value : 0.72222
## Neg Pred Value : 0.63946
## Prevalence : 0.40000
## Detection Rate : 0.07879
## Detection Prevalence : 0.10909
## Balanced Accuracy : 0.57323
##
## 'Positive' Class : CN
##
cm_FeatEval_Freq_rf_Accuracy<-cm_FeatEval_Freq_rf$overall["Accuracy"]
print(cm_FeatEval_Freq_rf_Accuracy)
## Accuracy
## 0.6484848
cm_FeatEval_Freq_rf_Kappa<-cm_FeatEval_Freq_rf$overall["Kappa"]
print(cm_FeatEval_Freq_rf_Kappa)
## Kappa
## 0.1666667
importance_rf_model <- varImp(rf_model)
print(importance_rf_model)
## rf variable importance
##
## only 20 most important variables shown (out of 275)
##
## Importance
## age.now 100.00
## cg12543766 96.97
## cg23836570 90.12
## cg16652920 86.33
## cg03749159 85.56
## cg08914944 85.20
## cg03924089 84.10
## cg24883219 84.02
## cg06286533 83.14
## cg06864789 82.34
## cg24139837 82.01
## cg20208879 81.04
## cg08857872 80.48
## cg04728936 79.74
## cg24851651 79.54
## cg14710850 78.51
## cg04412904 78.15
## cg02464073 77.65
## cg23066280 76.72
## cg00553601 75.95
plot(importance_rf_model, top = 20, main = "Variable Importance Plot")
importance_rf_model_df<-importance_rf_model$importance
if( METHOD_FEATURE_FLAG==5){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(MCI))
print(Ordered_importance_rf_final_model)
}
## CN MCI
## 1 2.67165588 2.67165588
## 2 2.53305292 2.53305292
## 3 2.21935380 2.21935380
## 4 2.04567517 2.04567517
## 5 2.01032689 2.01032689
## 6 1.99403640 1.99403640
## 7 1.94366262 1.94366262
## 8 1.93986220 1.93986220
## 9 1.89942922 1.89942922
## 10 1.86296278 1.86296278
## 11 1.84795913 1.84795913
## 12 1.80318285 1.80318285
## 13 1.77752541 1.77752541
## 14 1.74387015 1.74387015
## 15 1.73446920 1.73446920
## 16 1.68748031 1.68748031
## 17 1.67117744 1.67117744
## 18 1.64805441 1.64805441
## 19 1.60530165 1.60530165
## 20 1.57018308 1.57018308
## 21 1.56859363 1.56859363
## 22 1.55647406 1.55647406
## 23 1.52083747 1.52083747
## 24 1.46648279 1.46648279
## 25 1.44292515 1.44292515
## 26 1.40831210 1.40831210
## 27 1.37301627 1.37301627
## 28 1.34531054 1.34531054
## 29 1.28912034 1.28912034
## 30 1.27299470 1.27299470
## 31 1.27221001 1.27221001
## 32 1.26980902 1.26980902
## 33 1.25041305 1.25041305
## 34 1.24331209 1.24331209
## 35 1.22714112 1.22714112
## 36 1.21705259 1.21705259
## 37 1.21230808 1.21230808
## 38 1.21029681 1.21029681
## 39 1.21024406 1.21024406
## 40 1.18542451 1.18542451
## 41 1.18159089 1.18159089
## 42 1.17617634 1.17617634
## 43 1.15680067 1.15680067
## 44 1.14871095 1.14871095
## 45 1.13906807 1.13906807
## 46 1.11627783 1.11627783
## 47 1.11069678 1.11069678
## 48 1.06586549 1.06586549
## 49 1.06138745 1.06138745
## 50 1.05892666 1.05892666
## 51 1.04595798 1.04595798
## 52 1.04537480 1.04537480
## 53 0.98188496 0.98188496
## 54 0.96612011 0.96612011
## 55 0.95198761 0.95198761
## 56 0.95103152 0.95103152
## 57 0.93343065 0.93343065
## 58 0.89049405 0.89049405
## 59 0.88711448 0.88711448
## 60 0.87196992 0.87196992
## 61 0.87165285 0.87165285
## 62 0.85838881 0.85838881
## 63 0.85332401 0.85332401
## 64 0.83945460 0.83945460
## 65 0.82333316 0.82333316
## 66 0.82297820 0.82297820
## 67 0.81742711 0.81742711
## 68 0.81603696 0.81603696
## 69 0.80869846 0.80869846
## 70 0.80512858 0.80512858
## 71 0.80445780 0.80445780
## 72 0.79867873 0.79867873
## 73 0.79781562 0.79781562
## 74 0.79406864 0.79406864
## 75 0.79248349 0.79248349
## 76 0.77106505 0.77106505
## 77 0.76248885 0.76248885
## 78 0.75016241 0.75016241
## 79 0.74243510 0.74243510
## 80 0.73445555 0.73445555
## 81 0.71486275 0.71486275
## 82 0.70749426 0.70749426
## 83 0.70698240 0.70698240
## 84 0.70519789 0.70519789
## 85 0.70376978 0.70376978
## 86 0.69678902 0.69678902
## 87 0.67325754 0.67325754
## 88 0.66993892 0.66993892
## 89 0.65618909 0.65618909
## 90 0.65008311 0.65008311
## 91 0.63177823 0.63177823
## 92 0.60601554 0.60601554
## 93 0.60316645 0.60316645
## 94 0.59326677 0.59326677
## 95 0.59285452 0.59285452
## 96 0.58232648 0.58232648
## 97 0.58230748 0.58230748
## 98 0.54818136 0.54818136
## 99 0.54420059 0.54420059
## 100 0.54115553 0.54115553
## 101 0.51882998 0.51882998
## 102 0.50258921 0.50258921
## 103 0.49994555 0.49994555
## 104 0.49584091 0.49584091
## 105 0.49405724 0.49405724
## 106 0.49381373 0.49381373
## 107 0.49239634 0.49239634
## 108 0.46904567 0.46904567
## 109 0.46128145 0.46128145
## 110 0.45677754 0.45677754
## 111 0.44423839 0.44423839
## 112 0.43986086 0.43986086
## 113 0.43819086 0.43819086
## 114 0.43802393 0.43802393
## 115 0.43336864 0.43336864
## 116 0.43322715 0.43322715
## 117 0.43060453 0.43060453
## 118 0.42392983 0.42392983
## 119 0.42119180 0.42119180
## 120 0.38306530 0.38306530
## 121 0.37855664 0.37855664
## 122 0.37274532 0.37274532
## 123 0.36476158 0.36476158
## 124 0.35991043 0.35991043
## 125 0.35560048 0.35560048
## 126 0.34770758 0.34770758
## 127 0.33919286 0.33919286
## 128 0.33794002 0.33794002
## 129 0.33033673 0.33033673
## 130 0.32941057 0.32941057
## 131 0.31786033 0.31786033
## 132 0.31398448 0.31398448
## 133 0.30999964 0.30999964
## 134 0.30896898 0.30896898
## 135 0.30655815 0.30655815
## 136 0.29937260 0.29937260
## 137 0.28685888 0.28685888
## 138 0.27804299 0.27804299
## 139 0.27334876 0.27334876
## 140 0.25483536 0.25483536
## 141 0.25443372 0.25443372
## 142 0.24744258 0.24744258
## 143 0.23978123 0.23978123
## 144 0.22303707 0.22303707
## 145 0.22292310 0.22292310
## 146 0.20518473 0.20518473
## 147 0.20468035 0.20468035
## 148 0.20016065 0.20016065
## 149 0.19005733 0.19005733
## 150 0.18977013 0.18977013
## 151 0.18318047 0.18318047
## 152 0.18122121 0.18122121
## 153 0.15523774 0.15523774
## 154 0.15341431 0.15341431
## 155 0.15038910 0.15038910
## 156 0.15023247 0.15023247
## 157 0.13745188 0.13745188
## 158 0.13132951 0.13132951
## 159 0.09390486 0.09390486
## 160 0.09281140 0.09281140
## 161 0.08560490 0.08560490
## 162 0.08318375 0.08318375
## 163 0.07637167 0.07637167
## 164 0.06951255 0.06951255
## 165 0.06188503 0.06188503
## 166 0.03558190 0.03558190
## 167 0.01972745 0.01972745
## 168 0.01857614 0.01857614
## 169 -0.01110912 -0.01110912
## 170 -0.02492520 -0.02492520
## 171 -0.03240479 -0.03240479
## 172 -0.06078992 -0.06078992
## 173 -0.06775366 -0.06775366
## 174 -0.06822362 -0.06822362
## 175 -0.06880801 -0.06880801
## 176 -0.07904498 -0.07904498
## 177 -0.09321274 -0.09321274
## 178 -0.11371997 -0.11371997
## 179 -0.13002812 -0.13002812
## 180 -0.13509129 -0.13509129
## 181 -0.13847649 -0.13847649
## 182 -0.15877607 -0.15877607
## 183 -0.16109097 -0.16109097
## 184 -0.17294474 -0.17294474
## 185 -0.18025472 -0.18025472
## 186 -0.18280212 -0.18280212
## 187 -0.19295458 -0.19295458
## 188 -0.19446456 -0.19446456
## 189 -0.19785234 -0.19785234
## 190 -0.20037296 -0.20037296
## 191 -0.20936854 -0.20936854
## 192 -0.21976474 -0.21976474
## 193 -0.22196744 -0.22196744
## 194 -0.22694699 -0.22694699
## 195 -0.23526747 -0.23526747
## 196 -0.23565830 -0.23565830
## 197 -0.23743773 -0.23743773
## 198 -0.24530382 -0.24530382
## 199 -0.25077886 -0.25077886
## 200 -0.25172124 -0.25172124
## 201 -0.26003745 -0.26003745
## 202 -0.27554840 -0.27554840
## 203 -0.27728419 -0.27728419
## 204 -0.29847933 -0.29847933
## 205 -0.30712727 -0.30712727
## 206 -0.31539100 -0.31539100
## 207 -0.31745464 -0.31745464
## 208 -0.32626873 -0.32626873
## 209 -0.36313372 -0.36313372
## 210 -0.36806355 -0.36806355
## 211 -0.37292038 -0.37292038
## 212 -0.37320531 -0.37320531
## 213 -0.38018713 -0.38018713
## 214 -0.38464877 -0.38464877
## 215 -0.39541286 -0.39541286
## 216 -0.41141006 -0.41141006
## 217 -0.42498798 -0.42498798
## 218 -0.42683375 -0.42683375
## 219 -0.42774899 -0.42774899
## 220 -0.43718111 -0.43718111
## 221 -0.47069589 -0.47069589
## 222 -0.51460631 -0.51460631
## 223 -0.53571510 -0.53571510
## 224 -0.54342584 -0.54342584
## 225 -0.55169450 -0.55169450
## 226 -0.57332764 -0.57332764
## 227 -0.60924569 -0.60924569
## 228 -0.63632540 -0.63632540
## 229 -0.63764404 -0.63764404
## 230 -0.64409649 -0.64409649
## 231 -0.66145359 -0.66145359
## 232 -0.66387460 -0.66387460
## 233 -0.66610079 -0.66610079
## 234 -0.66794110 -0.66794110
## 235 -0.67578090 -0.67578090
## 236 -0.67743527 -0.67743527
## 237 -0.69411086 -0.69411086
## 238 -0.71157933 -0.71157933
## 239 -0.74067389 -0.74067389
## 240 -0.74133491 -0.74133491
## 241 -0.74810462 -0.74810462
## 242 -0.74941892 -0.74941892
## 243 -0.76811250 -0.76811250
## 244 -0.78061917 -0.78061917
## 245 -0.82339224 -0.82339224
## 246 -0.83549313 -0.83549313
## 247 -0.84887183 -0.84887183
## 248 -0.85830080 -0.85830080
## 249 -0.86469802 -0.86469802
## 250 -0.89168656 -0.89168656
## 251 -0.90300087 -0.90300087
## 252 -0.90544053 -0.90544053
## 253 -0.90713629 -0.90713629
## 254 -0.92113145 -0.92113145
## 255 -0.94436791 -0.94436791
## 256 -0.94566225 -0.94566225
## 257 -0.95260710 -0.95260710
## 258 -0.96245989 -0.96245989
## 259 -0.98919388 -0.98919388
## 260 -1.00478875 -1.00478875
## 261 -1.01455772 -1.01455772
## 262 -1.03595686 -1.03595686
## 263 -1.04469982 -1.04469982
## 264 -1.05964059 -1.05964059
## 265 -1.07920035 -1.07920035
## 266 -1.14613929 -1.14613929
## 267 -1.15269175 -1.15269175
## 268 -1.16799187 -1.16799187
## 269 -1.18576912 -1.18576912
## 270 -1.29535845 -1.29535845
## 271 -1.35611066 -1.35611066
## 272 -1.72337076 -1.72337076
## 273 -1.81174220 -1.81174220
## 274 -1.81572495 -1.81572495
## 275 -1.90820292 -1.90820292
if( METHOD_FEATURE_FLAG==4||METHOD_FEATURE_FLAG==6){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(Dementia))
print(Ordered_importance_rf_final_model)
}
if( METHOD_FEATURE_FLAG==3){
importance_rf_final_model <- varImp(rf_model$finalModel)
library(dplyr)
Ordered_importance_rf_final_model <- importance_rf_final_model %>% arrange(desc(CI))
print(Ordered_importance_rf_final_model)
}
if(METHOD_FEATURE_FLAG==1){
# for the multi classification case,
# for each feature, we will choose the maximum importance value
# Add a column for the maximum importance
importance_rf_model_df$Feature<-rownames(importance_rf_model_df)
importance_rf_model_df <- importance_rf_model_df %>%
mutate(MaxImportance = pmax(CN, Dementia, MCI)) %>%
arrange(desc(MaxImportance))
print(importance_rf_model_df)
}
if(METHOD_FEATURE_FLAG == 1){
importance_melted_rf_model_df <- importance_rf_model_df %>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 1){
print(importance_rf_model_df %>% head(20))
print("the top 20 features based on max way:")
print(head(importance_rf_model_df,n=20)$Feature)
importance_melted_rf_model_df <- importance_rf_model_df %>%
head(20)%>%
dplyr::select(-MaxImportance) %>%
melt(id.vars ="Feature", variable.name = "Class", value.name = "Importance")
ggplot(importance_melted_rf_model_df,
aes(x = reorder(Feature, -Importance),
y = Importance, fill = Class)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Feature Importance Across Classes",
x = "Feature",
y = "Importance",
fill = "Class") +
theme_minimal()
}
if(METHOD_FEATURE_FLAG == 5){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
## Area under the curve: 0.7697
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curve <- roc(test_data_RFM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_RFM1$DX)))
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_rf_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(rf_model, newdata = test_data_RFM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_rf_AUC<-mean_auc
}
print(FeatEval_Freq_rf_AUC)
## Area under the curve: 0.7697
df_SVM<-processed_data
featureName_SVM1<-AfterProcess_FeatureName
trainIndex <- createDataPartition(df_SVM$DX, p = 0.7, list = FALSE)
train_data_SVM1 <- df_SVM[trainIndex, ]
test_data_SVM1 <- df_SVM[-trainIndex, ]
X_train_SVM1 <- subset(train_data_SVM1,select = -DX)
y_train_SVM1 <- train_data_SVM1$DX
X_test_SVM1 <- subset(test_data_SVM1, select= -DX )
y_test_SVM1 <- test_data_SVM1$DX
train_control <- trainControl(method = "cv", number = 5, classProbs = TRUE)
svm_model <- caret::train(DX ~ ., data = train_data_SVM1,
method = "svmRadial",
trControl = train_control)
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 389 samples
## 275 predictors
## 2 classes: 'CN', 'MCI'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 312, 311, 311, 311, 311
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8123543 0.6248119
## 0.50 0.8226773 0.6456629
## 1.00 0.8329337 0.6572412
##
## Tuning parameter 'sigma' was held constant at a value of 0.001833521
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.001833521 and C = 1.
print(svm_model$bestTune)
## sigma C
## 3 0.001833521 1
mean_accuracy_svm_model<- mean(svm_model$results$Accuracy)
print("The Mean accuracy of resampling results across tuning parameters is:")
## [1] "The Mean accuracy of resampling results across tuning parameters is:"
print(mean_accuracy_svm_model)
## [1] 0.8226551
FeatEval_Freq_mean_accuracy_cv_svm<-mean_accuracy_svm_model
print(FeatEval_Freq_mean_accuracy_cv_svm)
## [1] 0.8226551
train_predictions <- predict(svm_model, newdata = train_data_SVM1)
train_accuracy <- mean(train_predictions == train_data_SVM1$DX)
print(paste("Training Accuracy: ", train_accuracy))
## [1] "Training Accuracy: 0.992287917737789"
FeatEval_Freq_svm_trainAccuracy <- train_accuracy
print(FeatEval_Freq_svm_trainAccuracy)
## [1] 0.9922879
predictions <- predict(svm_model, newdata = test_data_SVM1)
cm_FeatEval_Freq_svm<-caret::confusionMatrix(predictions,test_data_SVM1$DX)
print(cm_FeatEval_Freq_svm)
## Confusion Matrix and Statistics
##
## Reference
## Prediction CN MCI
## CN 58 13
## MCI 8 86
##
## Accuracy : 0.8727
## 95% CI : (0.8121, 0.9195)
## No Information Rate : 0.6
## P-Value [Acc > NIR] : 1.212e-14
##
## Kappa : 0.7382
##
## Mcnemar's Test P-Value : 0.3827
##
## Sensitivity : 0.8788
## Specificity : 0.8687
## Pos Pred Value : 0.8169
## Neg Pred Value : 0.9149
## Prevalence : 0.4000
## Detection Rate : 0.3515
## Detection Prevalence : 0.4303
## Balanced Accuracy : 0.8737
##
## 'Positive' Class : CN
##
cm_FeatEval_Freq_svm_Accuracy <- cm_FeatEval_Freq_svm$overall["Accuracy"]
cm_FeatEval_Freq_svm_Kappa <- cm_FeatEval_Freq_svm$overall["Kappa"]
print(cm_FeatEval_Freq_svm_Accuracy)
## Accuracy
## 0.8727273
print(cm_FeatEval_Freq_svm_Kappa)
## Kappa
## 0.7381546
Let’s take a look of the feature importance of the model trained.
library(iml)
predictor_SVM <- Predictor$new(svm_model,data = df_SVM,y=df_SVM$DX)
importance_SVM <- FeatureImp$new(predictor_SVM,loss="ce")
print(importance_SVM)
## Interpretation method: FeatureImp
## error function: ce
##
## Analysed predictor:
## Prediction task: classification
## Classes:
##
## Analysed data:
## Sampling from data.frame with 554 rows and 276 columns.
##
##
## Head of results:
## feature importance.05 importance importance.95 permutation.error
## 1 cg16571124 1.0500000 1.125000 1.125000 0.04873646
## 2 cg26081710 0.9666667 1.083333 1.083333 0.04693141
## 3 cg23432430 1.0083333 1.083333 1.150000 0.04693141
## 4 cg00962106 1.0416667 1.083333 1.116667 0.04693141
## 5 cg25879395 1.0083333 1.083333 1.125000 0.04693141
## 6 cg21209485 1.0416667 1.083333 1.116667 0.04693141
plot(importance_SVM)
library(vip)
vip(svm_model, method = "permute", train = train_data_SVM1, target = "DX", nsim = 10, metric = "bal_accuracy", pred_wrapper = predict)
importance_SVM_df<-importance_SVM$results
if(METHOD_FEATURE_FLAG == 5){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "MCI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
## Setting direction: controls > cases
##
## Call:
## roc.default(response = test_data_SVM1$DX, predictor = prob_predictions[, "MCI"], levels = rev(levels(test_data_SVM1$DX)))
##
## Data: prob_predictions[, "MCI"] in 99 controls (test_data_SVM1$DX MCI) > 66 cases (test_data_SVM1$DX CN).
## Area under the curve: 0.9532
## [1] "The auc vlue is:"
## Area under the curve: 0.9532
if(METHOD_FEATURE_FLAG == 4||METHOD_FEATURE_FLAG==6){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "Dementia"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if(METHOD_FEATURE_FLAG == 3){
library(e1071)
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curve <- roc(test_data_SVM1$DX,
prob_predictions[, "CI"],
levels = rev(levels(test_data_SVM1$DX)))
print(roc_curve)
print("The auc vlue is:")
auc_value <- roc_curve$auc
print(auc_value)
FeatEval_Freq_svm_AUC<-auc_value
plot(roc_curve, col = "blue", lwd = 2, main = "ROC Curve")
}
if (METHOD_FEATURE_FLAG == 1){
prob_predictions <- predict(svm_model, newdata = test_data_SVM1, type = "prob")
roc_curves <- list()
auc_values <- numeric()
classes <- levels(testData$DX)
for (class in classes) {
binary_labels <- ifelse(testData$DX == class, 1, 0)
roc_curve <- roc(binary_labels, prob_predictions[, class])
roc_curves[[class]] <- roc_curve
auc_values[class] <- roc_curve$auc
}
for (class in classes) {
cat("Class:", class, "\n")
print(roc_curves[[class]])
cat("The AUC value for class", class, "is:", auc_values[class], "\n\n")
}
plot(roc_curves[[1]], col = "blue",
lwd = 2,
main = "One versus Rest - ROC Curve for Each Class")
for (i in 2:length(classes)) {
lines(roc_curves[[i]], col = i+1, lwd = 2)
}
legend("bottomright", legend = classes, col = 1:length(classes)+1, lwd = 2)
}
if(METHOD_FEATURE_FLAG ==1){
mean_auc <- mean(auc_values)
cat("The mean AUC value across all classes with one versus rest method is:",
mean_auc, "\n")
FeatEval_Freq_svm_AUC<-mean_auc
}
print(FeatEval_Freq_svm_AUC)
## Area under the curve: 0.9532
In the INPUT Session, “Metrics_Table_Output_FLAG” : This is the flag of output the metrics of this file, include model training stage metrics , key features selected based on mean Performance metrics, key feature selected based on median Performance metrics, key feature selected based on frequency Performance metrics
Feature_and_model_Metrics <- c("Training Accuracy", "Test Accuracy", "Test Kappa", "AUC", "Average Test Accuracy during Cross Validation")
ModelTrain_stage_Logistic_metrics_ModelTrainStage <- c(modelTrain_LRM1_trainAccuracy, cm_modelTrain_LRM1_Accuracy, cm_modelTrain_LRM1_Kappa,modelTrain_LRM1_AUC, modelTrain_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Mean<-c(FeatEval_Mean_LRM1_trainAccuracy,
cm_FeatEval_Mean_LRM1_Accuracy,cm_FeatEval_Mean_LRM1_Kappa,FeatEval_Mean_LRM1_AUC, FeatEval_Mean_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Median<-c(FeatEval_Median_LRM1_trainAccuracy,
cm_FeatEval_Median_LRM1_Accuracy,cm_FeatEval_Median_LRM1_Kappa,FeatEval_Median_LRM1_AUC, FeatEval_Median_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics_Feature_Freq<-c(FeatEval_Freq_LRM1_trainAccuracy,
cm_FeatEval_Freq_LRM1_Accuracy,cm_FeatEval_Freq_LRM1_Kappa,FeatEval_Freq_LRM1_AUC,FeatEval_Freq_mean_accuracy_cv_LRM1)
ModelTrain_stage_Logistic_metrics<-c(ModelTrain_stage_Logistic_metrics_ModelTrainStage, ModelTrain_stage_Logistic_metrics_Feature_Mean,ModelTrain_stage_Logistic_metrics_Feature_Median,ModelTrain_stage_Logistic_metrics_Feature_Freq)
ModelTrain_stage_ElasticNet_metrics_ModelTrainStage <- c(modelTrain_ENM1_trainAccuracy, cm_modelTrain_ENM1_Accuracy, cm_modelTrain_ENM1_Kappa,modelTrain_ENM1_AUC, modelTrain_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Mean<-c(FeatEval_Mean_ENM1_trainAccuracy,
cm_FeatEval_Mean_ENM1_Accuracy,cm_FeatEval_Mean_ENM1_Kappa,FeatEval_Mean_ENM1_AUC, FeatEval_Mean_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Median<-c(FeatEval_Median_ENM1_trainAccuracy,
cm_FeatEval_Median_ENM1_Accuracy,cm_FeatEval_Median_ENM1_Kappa,FeatEval_Median_ENM1_AUC, FeatEval_Median_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics_Feature_Freq<-c(FeatEval_Freq_ENM1_trainAccuracy,
cm_FeatEval_Freq_ENM1_Accuracy,cm_FeatEval_Freq_ENM1_Kappa,FeatEval_Freq_ENM1_AUC,FeatEval_Freq_mean_accuracy_cv_ENM1)
ModelTrain_stage_ElasticNet_metrics<-c(ModelTrain_stage_ElasticNet_metrics_ModelTrainStage, ModelTrain_stage_ElasticNet_metrics_Feature_Mean,ModelTrain_stage_ElasticNet_metrics_Feature_Median,ModelTrain_stage_ElasticNet_metrics_Feature_Freq)
ModelTrain_stage_XGBoost_metrics_ModelTrainStage <- c(modelTrain_xgb_trainAccuracy, cm_modelTrain_xgb_Accuracy, cm_modelTrain_xgb_Kappa,modelTrain_xgb_AUC, modelTrain_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Mean<-c(FeatEval_Mean_xgb_trainAccuracy,
cm_FeatEval_Mean_xgb_Accuracy,cm_FeatEval_Mean_xgb_Kappa,FeatEval_Mean_xgb_AUC, FeatEval_Mean_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Median<-c(FeatEval_Median_xgb_trainAccuracy,
cm_FeatEval_Median_xgb_Accuracy,cm_FeatEval_Median_xgb_Kappa,FeatEval_Median_xgb_AUC, FeatEval_Median_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics_Feature_Freq<-c(FeatEval_Freq_xgb_trainAccuracy,
cm_FeatEval_Freq_xgb_Accuracy,cm_FeatEval_Freq_xgb_Kappa,FeatEval_Freq_xgb_AUC,FeatEval_Freq_mean_accuracy_cv_xgb)
ModelTrain_stage_XGBoost_metrics<-c(ModelTrain_stage_XGBoost_metrics_ModelTrainStage, ModelTrain_stage_XGBoost_metrics_Feature_Mean,ModelTrain_stage_XGBoost_metrics_Feature_Median,ModelTrain_stage_XGBoost_metrics_Feature_Freq)
ModelTrain_stage_RandomForest_metrics_ModelTrainStage <- c(modelTrain_rf_trainAccuracy, cm_modelTrain_rf_Accuracy, cm_modelTrain_rf_Kappa,modelTrain_rf_AUC, modelTrain_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Mean<-c(FeatEval_Mean_rf_trainAccuracy,
cm_FeatEval_Mean_rf_Accuracy,cm_FeatEval_Mean_rf_Kappa,FeatEval_Mean_rf_AUC, FeatEval_Mean_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Median<-c(FeatEval_Median_rf_trainAccuracy,
cm_FeatEval_Median_rf_Accuracy,cm_FeatEval_Median_rf_Kappa,FeatEval_Median_rf_AUC, FeatEval_Median_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics_Feature_Freq<-c(FeatEval_Freq_rf_trainAccuracy,
cm_FeatEval_Freq_rf_Accuracy,cm_FeatEval_Freq_rf_Kappa,FeatEval_Freq_rf_AUC,FeatEval_Freq_mean_accuracy_cv_rf)
ModelTrain_stage_RandomForest_metrics<-c(ModelTrain_stage_RandomForest_metrics_ModelTrainStage, ModelTrain_stage_RandomForest_metrics_Feature_Mean,ModelTrain_stage_RandomForest_metrics_Feature_Median,ModelTrain_stage_RandomForest_metrics_Feature_Freq)
ModelTrain_stage_SVM_metrics_ModelTrainStage <- c(modelTrain_svm_trainAccuracy, cm_modelTrain_svm_Accuracy, cm_modelTrain_svm_Kappa,modelTrain_svm_AUC, modelTrain_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Mean<-c(FeatEval_Mean_svm_trainAccuracy,
cm_FeatEval_Mean_svm_Accuracy,cm_FeatEval_Mean_svm_Kappa,FeatEval_Mean_svm_AUC, FeatEval_Mean_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Median<-c(FeatEval_Median_svm_trainAccuracy,
cm_FeatEval_Median_svm_Accuracy,cm_FeatEval_Median_svm_Kappa,FeatEval_Median_svm_AUC, FeatEval_Median_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics_Feature_Freq<-c(FeatEval_Freq_svm_trainAccuracy,
cm_FeatEval_Freq_svm_Accuracy,cm_FeatEval_Freq_svm_Kappa,FeatEval_Freq_svm_AUC,FeatEval_Freq_mean_accuracy_cv_svm)
ModelTrain_stage_SVM_metrics<-c(ModelTrain_stage_SVM_metrics_ModelTrainStage, ModelTrain_stage_SVM_metrics_Feature_Mean,ModelTrain_stage_SVM_metrics_Feature_Median,ModelTrain_stage_SVM_metrics_Feature_Freq)
if(METHOD_FEATURE_FLAG==1){
classifcationType = "Multiclass"
}
if(METHOD_FEATURE_FLAG==2){
classifcationType = "Multiclass and use PCA"
}
if(METHOD_FEATURE_FLAG==3){
classifcationType = "Binary"
}
if(METHOD_FEATURE_FLAG==4){
classifcationType = "CN vs Dementia (AD)"
}
if(METHOD_FEATURE_FLAG==5){
classifcationType = "CN vs MCI"
}
if(METHOD_FEATURE_FLAG==6){
classifcationType = "MCI vs Dementia"
}
Metrics_results_df <- data.frame()
library(dplyr)
Metrics_results_df <- data.frame(
`Number_of_CpG_used` = rep(Number_N_TopNCpGs, 20),
`Number_of_Phenotype_Features_Used` = rep(5, 20),
`Total_Number_of_features_before_Preprocessing` = rep(Number_N_TopNCpGs+5, 20),
`Number_of_features_after_processing` = rep(Num_feaForProcess, 20),
`Classification_Type` = rep(classifcationType, 20),
`Number_of_Key_features_Selected_(Mean,Median)` = rep(INPUT_NUMBER_FEATURES, 20),
`Number_of_Key_features_remained_based_on_frequency_methods` = rep(Num_KeyFea_Frequency, 20),
`Metrics_Stage` = c(rep("Model Train Stage",5),rep("Key Feature Evaluation (Select based on Mean) ",5),rep("Key Feature Evaluation (Select based on Median) ",5),rep("Key Feature Evaluation (Select based on Frequency) ",5)),
`Metric` = rep(Feature_and_model_Metrics, 4),
`Logistic_regression` = c(ModelTrain_stage_Logistic_metrics),
`Elastic_Net` = c(ModelTrain_stage_ElasticNet_metrics),
`XGBoost` = c(ModelTrain_stage_XGBoost_metrics),
`Random_Forest` = c(ModelTrain_stage_RandomForest_metrics),
`SVM` = c(ModelTrain_stage_SVM_metrics)
)
print(Metrics_results_df)
## Number_of_CpG_used Number_of_Phenotype_Features_Used Total_Number_of_features_before_Preprocessing Number_of_features_after_processing Classification_Type
## 1 5000 5 5005 324 CN vs MCI
## 2 5000 5 5005 324 CN vs MCI
## 3 5000 5 5005 324 CN vs MCI
## 4 5000 5 5005 324 CN vs MCI
## 5 5000 5 5005 324 CN vs MCI
## 6 5000 5 5005 324 CN vs MCI
## 7 5000 5 5005 324 CN vs MCI
## 8 5000 5 5005 324 CN vs MCI
## 9 5000 5 5005 324 CN vs MCI
## 10 5000 5 5005 324 CN vs MCI
## 11 5000 5 5005 324 CN vs MCI
## 12 5000 5 5005 324 CN vs MCI
## 13 5000 5 5005 324 CN vs MCI
## 14 5000 5 5005 324 CN vs MCI
## 15 5000 5 5005 324 CN vs MCI
## 16 5000 5 5005 324 CN vs MCI
## 17 5000 5 5005 324 CN vs MCI
## 18 5000 5 5005 324 CN vs MCI
## 19 5000 5 5005 324 CN vs MCI
## 20 5000 5 5005 324 CN vs MCI
## Number_of_Key_features_Selected_.Mean.Median. Number_of_Key_features_remained_based_on_frequency_methods Metrics_Stage
## 1 250 275 Model Train Stage
## 2 250 275 Model Train Stage
## 3 250 275 Model Train Stage
## 4 250 275 Model Train Stage
## 5 250 275 Model Train Stage
## 6 250 275 Key Feature Evaluation (Select based on Mean)
## 7 250 275 Key Feature Evaluation (Select based on Mean)
## 8 250 275 Key Feature Evaluation (Select based on Mean)
## 9 250 275 Key Feature Evaluation (Select based on Mean)
## 10 250 275 Key Feature Evaluation (Select based on Mean)
## 11 250 275 Key Feature Evaluation (Select based on Median)
## 12 250 275 Key Feature Evaluation (Select based on Median)
## 13 250 275 Key Feature Evaluation (Select based on Median)
## 14 250 275 Key Feature Evaluation (Select based on Median)
## 15 250 275 Key Feature Evaluation (Select based on Median)
## 16 250 275 Key Feature Evaluation (Select based on Frequency)
## 17 250 275 Key Feature Evaluation (Select based on Frequency)
## 18 250 275 Key Feature Evaluation (Select based on Frequency)
## 19 250 275 Key Feature Evaluation (Select based on Frequency)
## 20 250 275 Key Feature Evaluation (Select based on Frequency)
## Metric Logistic_regression Elastic_Net XGBoost Random_Forest SVM
## 1 Training Accuracy 1.0000000 0.9357326 1.0000000 1.0000000 0.9897172
## 2 Test Accuracy 0.8484848 0.8424242 0.6909091 0.6666667 0.8727273
## 3 Test Kappa 0.6835443 0.6560847 0.3376623 0.2028986 0.7286822
## 4 AUC 0.9150597 0.9473523 0.7333946 0.7750995 0.9542394
## 5 Average Test Accuracy during Cross Validation 0.7797499 0.7161347 0.6442043 0.6358419 0.8370629
## 6 Training Accuracy 1.0000000 0.9897172 1.0000000 1.0000000 0.9974293
## 7 Test Accuracy 0.8303030 0.8424242 0.7212121 0.6363636 0.8727273
## 8 Test Kappa 0.6500000 0.6666667 0.4010417 0.1428571 0.7368421
## 9 AUC 0.8858280 0.9123049 0.7814509 0.7685950 0.9386287
## 10 Average Test Accuracy during Cross Validation 0.8097680 0.7415659 0.6622985 0.6452214 0.8594295
## 11 Training Accuracy 1.0000000 0.9922879 1.0000000 1.0000000 0.9922879
## 12 Test Accuracy 0.8121212 0.8424242 0.6909091 0.6606061 0.8424242
## 13 Test Kappa 0.6075949 0.6666667 0.3410853 0.2000000 0.6766169
## 14 AUC 0.8907254 0.9012856 0.7633915 0.7715029 0.9361800
## 15 Average Test Accuracy during Cross Validation 0.8220150 0.7440601 0.6720511 0.6451881 0.8448329
## 16 Training Accuracy 1.0000000 0.9897172 1.0000000 1.0000000 0.9922879
## 17 Test Accuracy 0.8484848 0.8484848 0.7272727 0.6484848 0.8727273
## 18 Test Kappa 0.6835443 0.6786632 0.4186047 0.1666667 0.7381546
## 19 AUC 0.9012856 0.9236302 0.7698194 0.7696664 0.9531680
## 20 Average Test Accuracy during Cross Validation 0.8026122 0.7411089 0.6586691 0.6358197 0.8226551
Write out the data frame (Model Metrics) to csv file if FLAG_WRITE_METRICS_DF = TRUE
if(FLAG_WRITE_METRICS_DF){
write.csv(Metrics_results_df,OUTUT_PerformanceMetricsCSV_PATHNAME,row.names = FALSE)
print("Metrics Performance output path:")
print(OUTUT_PerformanceMetricsCSV_PATHNAME)
}
## [1] "Metrics Performance output path:"
## [1] "C:\\Users\\wangtia\\Desktop\\AD Risk\\part2\\VersionHistory\\Version7_AutoKnit_Results\\Method5_CN_vs_MCI\\Method5_CN_vs_MCI_PerformanceMetrics\\INPUT_5000CpGs_250SelFeature_PerMetrics.csv"
Phenotype Part Data frame : “phenoticPart_RAW”
RAW Merged Data frame : “merged_df_raw”
Processed Data, i.e data used for model train.
name for “processed_data” could be :
“processed_data_m1”, which uses method one to process the data
“processed_data_m2”, which uses method two to process the data, notice that the feature will be principle component.
“processed_data_m3”, which uses method three to process the data. This method is Transfer the “DX” to Binary Class. “CN” stays same, and “MCI”,“Dementia” will be transfer to “CI”.
Comment here is “processed_data_m3_df” is the data frame format of “processed_data_m3” with sample names as row names, and will assigned to “processed_dataFrame”.
“processed_data_m4”, which uses method four to process the data. This method is filtering the “DX”(drop “MCI” class), limited to CN and Dementia (AD) Classes.
“processed_data_m5”, which uses method five to process the data. This method is filtering the “DX”(drop “Dementia” class), limited to CN and MCI Classes.
“processed_data_m6”, which uses method six to process the data. This method is filtering the “DX”(drop “CN” class), limited to MCI and Dementia Classes.
name for “AfterProcess_FeatureName” could be :
Ordered Feature Importance Based on quantile Data Frame: “combined_importance_quantiles”
Ordered Feature Importance Based on Mean Data Frame: “combined_importance_Avg_ordered”
Feature Frequency / Common Data Frame:
“frequency_feature_df_RAW_ordered” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “INPUT_NUMBER_FEATURES”
“feature_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
Output data frame with selected features based on mean method: “df_selected_Mean”
, This data frame not have column named “SampleID”.
Output data frame with selected features based on median method: “df_selected_Median”, This data frame not have column named “SampleID”.
Output data frame with selected features based on frequency / common feature method: “df_process_Output_freq”, This data frame not have column named “SampleID”.
And the Feature names: “df_process_frequency_FeatureName”
“df_feature_Output_frequency” This is selected features’ frequency ordered by Total count of frequency. And the Top number selected in the first step is setted in the input session “NUM_COMMON_FEATURES_SET_Frequency”
“Selected_Frequency_Feature_importance” This is importance value of selected features’ frequency ordered by Total count of frequency
“feature_output_df_full” This is frequency of all features based on our Steps of Frequency Method, and it’s not ordered.
“all_Output_combined_df_impAvg” This is combined table of frequency and feature importance, it’s not ordered.
Number of CpG used: “Number_N_TopNCpGs”
Phenotype features selected:
Number of features before processing: (#Phenotype features selected) + (#CpGs Used)
Number of features after processing (DMP, data cleaning):“Num_feaForProcess”
Model performance (Variable names)- Model Training Stage:
| Initial Model Training Metric | Logistic regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | modelTrain_LRM1_trainAccuracy | modelTrain_ENM1_trainAccuracy | modelTrain_xgb_trainAccuracy | modelTrain_rf_trainAccuracy | modelTrain_svm_trainAccuracy |
| Test Accuracy | cm_modelTrain_LRM1_Accuracy | cm_modelTrain_ENM1_Accuracy | cm_modelTrain_xgb_Accuracy | cm_modelTrain_rf_Accuracy | cm_modelTrain_svm_Accuracy |
| Test Kappa | cm_modelTrain_LRM1_Kappa | cm_modelTrain_ENM1_Kappa | cm_modelTrain_xgb_Kappa | cm_modelTrain_rf_Kappa | cm_modelTrain_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | modelTrain_LRM1_AUC | modelTrain_ENM1_AUC | modelTrain_xgb_AUC | modelTrain_rf_AUC | modelTrain_svm_AUC |
| Average Test Accuracy during Cross Validation | modelTrain_mean_accuracy_cv_LRM1 | modelTrain_mean_accuracy_cv_ENM1 | modelTrain_mean_accuracy_cv_xgb | modelTrain_mean_accuracy_cv_rf | modelTrain_mean_accuracy_cv_svm |
Number of Key features selected (Mean/Median Methods) : “INPUT_NUMBER_FEATURES”
Number of Key features remained based on frequency methods :
“Num_KeyFea_Frequency”
Performance of the set of key features (Selected under 3 methods):
Based on Mean:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Mean_LRM1_trainAccuracy | FeatEval_Mean_ENM1_trainAccuracy | FeatEval_Mean_xgb_trainAccuracy | FeatEval_Mean_rf_trainAccuracy | FeatEval_Mean_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Mean_LRM1_Accuracy | cm_FeatEval_Mean_ENM1_Accuracy | cm_FeatEval_Mean_xgb_Accuracy | cm_FeatEval_Mean_rf_Accuracy | cm_FeatEval_Mean_svm_Accuracy |
| Test Kappa | cm_FeatEval_Mean_LRM1_Kappa | cm_FeatEval_Mean_ENM1_Kappa | cm_FeatEval_Mean_xgb_Kappa | cm_FeatEval_Mean_rf_Kappa | cm_FeatEval_Mean_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Mean_LRM1_AUC | FeatEval_Mean_ENM1_AUC | FeatEval_Mean_xgb_AUC | FeatEval_Mean_rf_AUC | FeatEval_Mean_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Mean_mean_accuracy_cv_LRM1 | FeatEval_Mean_mean_accuracy_cv_ENM1 | FeatEval_Mean_mean_accuracy_cv_xgb | FeatEval_Mean_mean_accuracy_cv_rf | FeatEval_Mean_mean_accuracy_cv_svm |
Based on Median:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Median_LRM1_trainAccuracy | FeatEval_Median_ENM1_trainAccuracy | FeatEval_Median_xgb_trainAccuracy | FeatEval_Median_rf_trainAccuracy | FeatEval_Median_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Median_LRM1_Accuracy | cm_FeatEval_Median_ENM1_Accuracy | cm_FeatEval_Median_xgb_Accuracy | cm_FeatEval_Median_rf_Accuracy | cm_FeatEval_Median_svm_Accuracy |
| Test Kappa | cm_FeatEval_Median_LRM1_Kappa | cm_FeatEval_Median_ENM1_Kappa | cm_FeatEval_Median_xgb_Kappa | cm_FeatEval_Median_rf_Kappa | cm_FeatEval_Median_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Median_LRM1_AUC | FeatEval_Median_ENM1_AUC | FeatEval_Median_xgb_AUC | FeatEval_Median_rf_AUC | FeatEval_Median_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Median_mean_accuracy_cv_LRM1 | FeatEval_Median_mean_accuracy_cv_ENM1 | FeatEval_Median_mean_accuracy_cv_xgb | FeatEval_Median_mean_accuracy_cv_rf | FeatEval_Median_mean_accuracy_cv_svm |
Based on Frequency:
| Key Features Performance Selected based on Mean | Logistic Regression | Elastic Net | XGBoost | Random Forest | SVM |
|---|---|---|---|---|---|
| Training Accuracy | FeatEval_Freq_LRM1_trainAccuracy | FeatEval_Freq_ENM1_trainAccuracy | FeatEval_Freq_xgb_trainAccuracy | FeatEval_Freq_rf_trainAccuracy | FeatEval_Freq_svm_trainAccuracy |
| Test Accuracy | cm_FeatEval_Freq_LRM1_Accuracy | cm_FeatEval_Freq_ENM1_Accuracy | cm_FeatEval_Freq_xgb_Accuracy | cm_FeatEval_Freq_rf_Accuracy | cm_FeatEval_Freq_svm_Accuracy |
| Test Kappa | cm_FeatEval_Freq_LRM1_Kappa | cm_FeatEval_Freq_ENM1_Kappa | cm_FeatEval_Freq_xgb_Kappa | cm_FeatEval_Freq_rf_Kappa | cm_FeatEval_Freq_svm_Kappa |
| AUC (for multi class, use mean AUC , and use one vs rest method) | FeatEval_Freq_LRM1_AUC | FeatEval_Freq_ENM1_AUC | FeatEval_Freq_xgb_AUC | FeatEval_Freq_rf_AUC | FeatEval_Freq_svm_AUC |
| Average Test Accuracy during Cross Validation | FeatEval_Freq_mean_accuracy_cv_LRM1 | FeatEval_Freq_mean_accuracy_cv_ENM1 | FeatEval_Freq_mean_accuracy_cv_xgb | FeatEval_Freq_mean_accuracy_cv_rf | FeatEval_Freq_mean_accuracy_cv_svm |